PrettyGoodConfigurationSystem
From Epowiki
A Pretty Good Configuration System Even Socrates Might Like
How do you configure your system? I've worked on configuration many times, but I've never done it right. I have come closer this time, but it's not perfect yet.
Ok, so what?
Here's what you'll get if you open up the prize hidden inside:
- You get a highly configurable system where every stakeholder in the system gets a chance to configure the system in a rational, well defined order using a powerful general mechanism. You can configure the same thing from the command line, from the environment, from a configuration file, or over a command port at runtime.
- You have an automatic way to provide an interactive interface to every library in an application over multiple interfaces (telnet, http, command line, etc). This means you can configure anything in an application at runtime and you can access all of an application "debug" interfaces at runtime.
- You get a system describable at runtime by applications in their own code. I like this better than building meta systems from configuration files. It's simple enough that programmers might just consider doing it because the benefits are many and the costs are few. The tricks are mostly in how everything fits and works together, not in the amount of programmer work invested.
Just What is Configuration?
Anything in a program that you can set or get.
Configuration is all the really sexy stuff in programming like setting log levels for different parts of an application, setting port numbers, setting host names, setting max queue sizes, max error thresholds, turning on and off features, injecting faults, running tests, invoking functions, getting application metrics, and getting values from different layers of a program so you can tell what is going on.
This kind of stuff is certainly boring to non-programmers, but it can be very interesting to programmers. Mostly what's of interest is how badly configuration is usually done. Configuration doesn't get much thought so it ends up being a confusing mix of poorly thought out hacks.
Many times I have witnessed conversations that went something like this:
Socrates: We had a sev 1 bug filed by the Hellenic Alliance. Spartan
troops passed through a valley they were monitoring and no
alarm was triggered! The system is still up and running on
host Euthyphro. What happened and how can we fix it?
Phaedo : How should I know?
Socrates: Can't we go look?
Phaedo : Look at what?
Socrates: You know, the logs and the programs and see what happened?
Phaedo : No, we don't have anything like that. Just give us the
complete state of the system and exactly what happened and
we'll recreate the problem.
We'll add some debug, put the processes in the debugger,
and we'll figure out what happened. It should take about a week
to get everything ready and completed. We don't have a test
system as big as the real system so it will take some setup time
and time to get all the hardware.
Socrates: But we have the system up and running now. Can't we look and
try to recreate the problem right now?
Phaedo : No, like I said, we can't do that sort of thing.
Socrates: How are we supposed to give you the complete state of the
system when you don't keep that sort of information around?
Phaedo : Uh, well, just give us what you got and we'll do our best.
Socrates: This is a really important customer. We need to move fast.
What can we do to fix this process in the future?
Phaedo : There's nothing broke! What do you expect us to do? All that
logging and other stuff slows the program down, takes up valuable
programmer time, and adds complexity.
Socrates: You can't figure how to make it work? We are going to lose a
big customer here if we can't get to a root cause by the close
of business today.
Phaedo : Well mister smarty pants, what would you do?
Socrates: Nothing, I have a feeling I won't be at this job much longer
anyway.
Why is Configuration a Big Deal?
For many programs configuration is easy. If you are writing a word processor and you can read all your configuration from a single file and any changes can be made from your UI then you probably have few problems.
But there's a whole other class of applications: long running server processes, where configuration is a big deal.
These processes are started once and should ideally run forever. They handle all the stuff people don't see, but will certainly notice when broken, like monitoring hardware, fault isolation, implementing routine maintenance, keeping statistics, tweaking allocation algorithms to ensure fairness, and a thousand other functions nobody on the outside will ever know about.
The Configuration Iceberg
What people don't realize is that it's these kinds of processes make up the bulk of most complex systems. The tip of the iceberg is what people see through the GUI. What's underneath forms the real part of the a system. Each process alone may be more complex than what the user sees through the system's GUI.
Server's are a System's Adaptive Unconscious
From How do you feel? How do you know?:
Humans possess a powerful set of psychological processes that are critical for survival and operate behind the conscious mental scene. These processes, called the "adaptive unconscious," are intimately involved in how we size up our world, perceive danger, initiate action, and set our goals. It is the unconscious that allows us to learn our native language with no conscious effort, recognize patterns in our environments while we think about something else, and develop reliable intuitions to guide our actions.
All the long lived server processes in a system form the adaptive unconscious of the system.
We usually don't have access to our unconscious, but when something goes wrong we need to be able to delve into the unconscious, figure out what's going on, and fix it. That's the role of a good configuration system. Unfortunately, psychotherapy is not as simple :-)
To Solve Problems We Need Our Questions Answered
When a problem rears up and threatens to chomp us in half, the only way developers have of fighting back is by looking at logs, poking into the insides of the program, changing configuration options, running tests, and repeating the process until we can figure out what's wrong.
This is a high art and eventually separates the systems that are reliable from those that aren't.
Be Especially Afraid of Alien Probes
Imagine you are working on the mars probe team. Any problems can end the mission and waste millions of dollars and years of hard work. You are a million miles away. Communication is painfully slow and low bandwidth.
These folks built a system with an unconscious laid bare. They could apply runtime patches. They could poke and prod every concealed corner. They knew what was happening everywhere at all times. And it saved their butt because the unexpected did happen and they needed all the capabilities they built into the system to fix it. See Mars Rover Problems for more details.
Initially, most projects don't go to this same level of effort to provide configuration, debugging, and monitoring features. But over time more and more features are added because that's what it takes to figure out what's going on in a complex system deployed in the field. If you can't figure out what's going on you can't fix it.
How might you handle your system's configuration responsibilities? It's impossible to say in general because every system is different. What you need for a set top box delivered to a 100,000 homes is different than what you need for an Air Traffic Control System which is different than what you need for MS Word.
When creating you configuration system here are some general ideas to think about.
What is Configuration?
I am going to distill configuration to: the getting and setting attribute value pairs.
Good old attribute-value pairs. Most data related tasks that don't require a rigid schema eventually use attribute-value pairs in one form or another.
And we usually don't want to use schemas because they slow us down. There is always a faction wanting to tie configuration to schemas, but you don't need to. Anyway, most configuration ends up outside the schema anyway, so why even bother starting with it? Is that too pragmatic?
Going forward I am going to talk about a global Cfg object containing the configuration for a system.
The First Law of Configuration
All configuration attributes should be settable and gettable from each and every access path.
Usually some attributes can be set on the command line, some can be set in the environment, some can be set in a configuration file.
Nope. All attributes should be first class configurable citizens.
The Second Law of Configuration
Every library should be able to present an operational interface to users, enabling them to inspect and change the behaviour of the library.
How Should Attributes be Named?
Hierarchical names are useful because you can use different namespaces to prevent naming conflicts. Because of our experience with file systems using a '/' as a separator seems natural enough.
A simple policy is to use each library's namespace as the root of their attribute namespace.
To set a log level for my App you would do something like:
/App/loglevel=debug
All Values are Strings
We don't try and type values, all values are kept as strings.
The configuration interface has conversion accessors so you can access a value in a desired format. For example, getInt("/App/age") would return the string for the "age" attribute as an integer.
The user provides typing data on use, not the configuration system. This has the advantage of being simple and the disadvantage of not being perfectly safe. Oh well.
What is Configured?
It's an illusion that a program is a single whole thing. A program is made up of separate libraries, much like our brains are made up of different components.
A library will likely be used across many applications. An application may use dozens of libraries. Each of these libraries must be able to:
- be configured
- react to configuration changes
- provide operations for accessing and testing the system
This is not easy and is rarely if ever possible. A program usually takes a few command line arguments and then uses those to configure the libraries it is using. And that's usually the end of it.
You can only configure what the programmer has had the stamina to add to the command line parser. Command line parsers are usually hard to extend so developers usually don't bother much.
If you want each library to expose which attributes are configurable you have to make it easy for developers to:
- name of the attributes,
- tell what the attributes mean
- react to changes in attributes
- provide operations
When Does Configuration Happen?
Configuration generally happens at two different times:
- creation
- runtime
Often configuration only takes effect at creation. This is the easiest approach, but is a real pain for debugging systems. You want to change something and have the change take effect now.
You may remember changing kernel parameters, compiling a new kernel, and then rebooting to see the changes take effect. Yuck. Wasn't it sweet when you could change parameters at runtime?
There are several ways around the "configuration only matters at creation time" problem. Some systems check a configuration file on a periodic basis for changes and reload the file if it has been changed. This has a high latency as you don't know how often the app will check for changes and it is kind of crude. You are asking an application writer to be able to read a configuration file, figure out what changed, and apply all the changes correctly.
Frankly, it's too hard and almost nobody will do it.
Representing Configuration with a Properties Singleton
One easy configuration approach is to make a singleton Properties object responsible for access to global configuration data.
Using this approach each an every part of a program can have access to global data. You don't have to pass data endlessly through constructors or create "context" objects that bind every part of a program to every other part.
In C++, create something like the Java Properties object loadable from an XML file. The XML file is very simple, something like:
<props name="">
<attr-name value="${VAR}string | string" default="${VAR}string | string" />
<props name="">
...
</props>
</props>
A properties contains attributes and other properties. "attr-name" is the name of the attribute. This is a very simple format and every library understands the same format.
A properties file can be loaded from a string or a file.
The Properties object has a lot nice convenience methods for getting attributes setting attributes using different types (int, time, etc) and dumping the configuration so you can see what is going on.
To get a value you would do Cfg::get("/namespace/loglevel"). To see if attribute existed you would do Cfg::isExist("/namespace/loglevel", "debug"). To set a value using an integer instead of a string you would go Cfg::setInt("/namespace/loglevel", "3").
Special ${VAR} Properties Syntax
As the Properties file is loaded it does some special checking. The "${VAR}" syntax means look in Cfg object for the variable named by VAR to find the value of this attribute.
If the variable is found then the result is textually substituted in. But if the variable is not found then the value in the "default" value is used for the value of the attribute.
At this point the Cfg object will have been loaded with all relevant environment variables so you have access to any environment variables too.
What this accomplishes is:
- Allows other variables to cleanly override configuration file definitions.
- Allows parts of a value to come from other variables. For example, you
could do:
<appCfgPath value="${ROOT}/app.xml default="/app/app-conf.xml" />
In this example you could define ROOT in your environment as pointing to your development directory. If ROOT wasn't defined then the default value "/app/app-conf.xml" would be used.
This is a simple and powerful approach. It gives you a way to parameterize your configuration files and at the same time allows other variable to take precedence of the configuration file.
The Many Faces of Configuration
What makes configuration tricky is all the different contexts you need to configure from. Some examples are:
- Deployment
- Local tests
- System tests
- Cron
- Scripts
- Other Applications
Your deployment may be totally locked down and all paths are full paths to the official locations for files. This is different from the configuration used to test with on a development machine because you may be dealing with several different program versions at once and you may also have the software officially installed on your development machine.
So you need a way to select different source of configuration information.
In some circumstances using command line overrides may be the easiest approach. In scripts, for example, passing command line variables is easy. But you could never pass large numbers of configuration options by hand.
Tests may test many different configuration profiles so keeping configuration in files is the easiest approach.
What is clear is that a flexible configuration approach makes it easy to adapt to most any situation.
Where is Configuration Stored?
- One File
- Separate Files
- Environment
- Scripts
- Command Line
Configuration could be stored in a number of separate files or in one large configuration file. Individual libraries may have their own configuration file and an application is likely to have configuration in one big file.
It shouldn't make a difference where configuration comes from.
Who Uses the Configuration System?
- Developers
- Field Support
- QA
- Advanced Customers
- Operations
- Manufacturing
All these groups may need to debug problems at one time or another. Keep all these types of people in mind when you are creating your Configuration system. Help them do their jobs.
Taking Configuration Precedence
The idea of precedence, that is, whose configuration values should be used as the real configuration values is a tricky issue when configuring a system. Ideally, you would like a system where a nice set of defaults would be in place in such a way that each could be overridden by successive levels of authorities.
In order of highest precedence, here are the sources of configuration in a program:
- Runtime configuration.
- Command Line values
- Environment Variables
- Application Configuration Files
- Library Configuration Files
- Hard Coded Program Values
Bootstrapping Makes Precedence Tricky
Bootstrapping refers to the fact that the order in which configuration attributes are set into Cfg can not match the precedence order, which is why we have the idea of precedence in the first place.
You have to worry about the order in which configuration is applied. If you let a lower precedence attribute override a higher precedence attribute then your configuration is wrong.
Let's say your hard coded default command port is 5002. On the command line you change it to 5004. If the command line doesn't override then you won't be able to talk to your process over the command port.
Now let's say there's also a command port in the application configuration file. The bootstrapping issue stomps on us because many settings are passed in on the command line. For example, you must parse the command to get the path to the application configuration file. It's natural if you are parsing the command line to put the attributes found on the command line into Cfg. But if you naively load the application configuration file over what's in Cfg then you'll clobber the command line attributes with the application attributes.
Another bootstrap issue is library initialization. Only libraries know which attributes they support, so normally we would need to call out to a library to get their attribute list. The problem is when the program starts libraries haven't been initialized. And libraries can't be initialized because we need the configuration to know what to create. Catch-22. We use a static constructor to get around this issue, but this means libraries will set Cfg with their hard coded defaults, which shouldn't take precedence over command line variables. But the command line is parsed after the static constructors run.
The command line order issue and the static constructor issue are why we need the idea of precedence. We have to bootstrap configuration yet we need to ensure we get the correct configuration when we are all finished.
How is Precedence Indicated?
Precedence is passed in the set attribute command. The set method might look like:
Cfg::set(string name, string value, string precedence);
The precedence level is evaluated when setting the attribute. If the precdence of the value being set is higher than the precedence used in the previous attribute set, then the attribute value is replaced. If not, then the attribute is not set.
Whoever is setting an attribute knows their precedence and passes it in the set command.
Precendence In Program Execution Order
The next sections talk about each of the source of configuration in the order they happen in the program. Note, this order is not the same as the precedence order.
Hard Coded Program Values Using Static Constructors
We start with defaults built into a program. For example, the default loglevel may be hard coded to critical so that the library won't overload the application with log data. The library expects the higher configuration levels to pick different defaults if it wants. The baked in library defaults are to ensure the library behaves sanely in the case where no other configuration was offered.
How does each separate library integrate into a single Configuration system? After all, if only the library knows about its configuration attributes, how would they ever be made known to the application so the application set and get the configuration value?
There are many ways to approach this, but I think one of the simplest is using static constructors. Wait, aren't these evil? Not all the the time. In this case construction order doesn't matter and there are no configuration dependencies between libraries.
By using a static constructor merely linking into a library gives the library a chance to configure itself into a global library list. You don't have to know about all the libraries you use. Libraries will automatically configure themselves.
Each library uses a static constructor which adds itself into the global Cfg object, providing the following information:
- Help text for each attribute.
- A menu of operations along with an object for implementing the operations.
- Initial values for each configurable attribute
- A change handler for any attribute changes they care about.
- A set and get handler for any attribute. This allows an attribute to be virtual. You can store attributes in application data structures is you wish.
- Type conversions operators for converting from string to integer, integer to string, and between other types.
- A visibility level for specifying an interest level at which attributes should be shown users. With a program having so many attributes, they all can't be shown at once. The visibility level provides a way to indicate which attributes the user is most interested in.
- An indication if the attribute represents a configuration file path for the library.
From this information an application can provide a unified and powerful interface to all the libraries inside an application.
Command Line
Command line values are the values you pass in on a program's command line. By having the next highest precedence for command line configuration you are allowing any user to set the values they want when they run the program, which is what a user wants. They don't want to know about your nifty configuration files.
By the time the command line is parsed all the libraries have configured themselves into the Cfg object with a precedence of "hard coded." Anything from the command line will override these settings.
The command line accepts value attribute pairs. Don't bother with options like -c or whatever. A program has way too many configuration options to map to command line options, so don't even bother. Pass only attribute value pairs on the command line. For example:
app /namespace/loglevel=debug /namespace/cmdport=5002
This way you can simply and easily support any number of command line options. Split up each AV pair and place it in the Cfg object with a precedence of "cmdline." Some of the command line options are usually paths to configuration files.
Environment Variables
Environment variables are the next level of override. You should be able set every configuration value in your environment.
One of the reasons libraries declare all their attributes with Cfg is so we can look for environment variable overrides for those attributes. Use the list of attributes registered with the Cfg object to check the environment for each attribute. If an attribute is found in the environment set it in Cfg with a precedence of "env."
Environment variable names often can not contains slashes or other special characters in them so you can't use the standard hierarchical name format. One option is for environment variables to replace the / with a _ as the _ is usually acceptable. And because it looks funny, remove the initial _.
Application Configuration
Application configuration files are the primary configuration for the program. Remember, a program is at one level simply a container for a set of libraries. An application will have a preference for how each of these libraries are configured.
Load all your application configuration files into Cfg with a precedence of "app." The location of the configuration could have come from the command line, library configuration, or from the environment. In the Cfg object there should be a "load" command for this purpose.
Library Configuration
Each library may have its own default configuration file where it stores what it considers to be a reasonable configuration.
For all the attributes marked as configuration file paths, load in their configurations. This is done by the application rather than the library so we can assume all configuration is complete before starting the application.
Start the Application
Now the configuration is complete, the application can start itself. There are also techniques for ensuring an orderly startup of all components, but this paper is already way too long. See System State Machine for more details.
Runtime Configuration
Runtume configuration is when someone changes configuration at, well, runtime. The program is up and running and you want to change it. For "real" programs with a GUI this isn't a big deal, for server processes without a GUI it isn't very easy.
Registering for Changes
When we set a configuration attribute we want libraries to make use of the change. Often a library might get the attribute values at creation and never refresh the values. Or a library could get the value from Cfg every time it uses it, but that wouldn't be efficient if an attribute is used often.
The alternative is for a library to register for a notification of when an attribute changes. When the attribute changes, the library will get notification and the library can take steps to propagate the new value.
Any library can register for changes on any namespace. Registration can be for an attribute or higher in the namespace.
If your program is threaded then you will need to make sure that changes propagate safely into different threads.
Command Port
How do you interact with a running process? Often it's through a command port of some kind. Configure each process with a TCP port that accepts commands.
You can, if you are clever, run different protocols over one command port. For example, you can connect via telnet over the command port. You can connect via http over the same command port. Or you could use SOAP and other options.
Command Port Interfaces
Libraries register the operations they support with Cfg. This allows a nice interactive interface to be automatically generated and presented to the user.
For example, when a user types in the help command they would receive a list of the supported commands along with usage and help information.
Over http the same information could delivered as a nice html page.
With the information contained in Cfg the options for providing cool and useful interfaces are limitted only by your imagination.
Bulk Sending of Commands
You'll likely need to send a command to all programs. For example, let's say you want check if all your programs are up and happy. You could send a heartbeat command to each process to check if they are all happy and healthy. Or if you are feeling the dark side you could send a command to kill of your processes.
The program that starts and restarts all your programs is a good candidate for sending commands to all your processes.
Patching Code at Runtime
In C++ it's almost impossible to insert code time at runtime, but in some languages you can download patches into a program to add new functionality. This is useful for tests and features that weren't built into your program.
Efficient Logging in a Separate Thread
One often used excuse against logging is that logging slows down a program. Done poorly it can. Done right there's no problem. I've implemented logging in extremely time sensitive applications, so it can be done.
Some tips:
- Preallocate logging buffers off a circular queue so memory is never allocated. Memory allocation is central lock point in most programs and causes a lot latency problems.
- Queue logging into a separate thread. Delay formatting of attributes in that thread if you can. This removes the cost of logging from the main line of your program and into a potentially lower priority thread.
Sending Metrics
Metrics are interesting things about your program you need to tell the world. Usually these metrics are gathered up and used for root cause analysis, performance profiling, auditing, debugging, what-if analysis, and more. They include information like error counts, packet counts, and bytes transfered.
Make it simple for programmers to send metrics by sending metrics over the same interface used for logging. Make a special format for metrics that can be parsed, say an attribute-value format. The logs can the be parsed for metrics and the metrics can be uploaded into a database for evaluation.
This is a low formality method that is easy yet works well.
But Wait, I Need to do X!
I know, you need a special configuration option for your system. Of course you do! When you build it just try to keep the overall configuration system in mind. Make whatever you do as global and powerful and possible so it works in all scenarios.
The End
The result of all this work, which is harder to write up than it is to do, is a very usable system for solving a lot of real life problems related to configuration and debugging problems in the field.
