Philosophical ramblings
Why? Two reasons, network administration and security.
From an administrative POV the server should be reliable, placed on a central point on the network, able to directly reach as many of the monitored items as possible so that the exact point of failure can be easily determined. It can’t do much monitoring if it’s not working due to hardware problems and by making it reach as many things as possible in as few hops as possible the chance of one failure on the network hindering it’s ability to monitor the rest of it is reduced.
From a security POV this is where all data about intrusions, successful or not, should be gathered. This is where one could start looking in order to find the what, when and how of an intrusion. The server must be online and working in order to gather the data, so it should be a secure server, preferably not directly accessible from the outside, to shield it from DoS attacks. Also, maybe use a different OS for it. If, say, the network is comprised of Linux servers and the attackers use some BigBad 0day exploit against them, this one being a BSD could stop them and give the administrators time to close down the network while still having enough data to determine later how to proceed.
So, stable and secure. Sounds like all the other servers. The catch is, this one isn’t the average production server, it doesn’t need to be open to users or the world, only the network administrators should have access, so security isn’t that hard to achieve. Having a different OS and a different authentication mechanism from the rest of the network (no domain integration, no common passwords, etc.) helps. Security is done in layers.
For the sake of shortness, let’s call this server BigBrother, aka BB.
Server roles
Centralized logging – main role, security-wise. Gathers all the logs from other network devices. Assuming intruders that do a bit more than just some random SQL injection, the first thing they’ll probably do would be to erase all traces. Meaning wipe out the logs, either erase them completely or just the parts that document how the intrusion happend, if they want to remain hidden in the system. Those logs would still be on the logging server though. Step two, set up a mechanism that would (automatically) erase traces of their activity in the future. One way would be to inject a kernel module that would block the unwanted log messages from ever being sent in the first place. Another way would be to modify something in the system, like the syslog binary, so the messages don’t get written to disk. Finally, a third way would be to clean the logs after they are written to disk. BB doesn’t help much in the first case, but in the second case, if all the modification does is stop the messages from being written to disk they might still get sent over the network. Third case, BB’s got them all anyway. One could even compare the local and remote logs at the end of the day and see if anything is missing. Of course, nothing stops the intruders from modifying the conf file, but that might trip some other alarm, or /etc/syslog.conf might not be the real file. Layers and misdirection.
Network monitoring – main role from a network administrator’s point of view. Keeps an eye on other devices and services on the network, make sure they do their job. Maybe measure and record performance, draw some pretty graphs and reports for whoever requests them.
NTP server – in order for the centralized logging to make sense all the devices on the network should have the same time so an NTP server is needed. May as well be this one.
Log parser – since all logs from the network come here and they can be brought on this server in a common format it only makes sense that it should do the parsing of it too. It’s all logged in real time so it can monitor for security threats, mail/web/proxy statistics, basically anything that the original host logs to syslog, BB can parse. The advantage is that it can be configured for each type of log once and work for any new host brought to the network.
IT inventory – it already watches a good chunk of the network in it’s monitoring role, so this could be extended to keep track of all the IT assets, hardware and/or software. Once a new machine is brought into the network it’s data could be input here, monitoring be automatically set up and maybe keep an eye on what software is running on it. However, this role is limited by the requirement that as few people as possible have access, so if, for example, the accounting department needs to be able to see and/or manipulate data, this shouldn’t be the machine to do it.
IDS / IPS control – since in it’s monitoring role it should have access to the whole network this is a good candidate for the server that controls any kind of intrusion detection/prevention system installed on the network. It can distribute configurations/signatures for the other IDS/IPS agents.
Repository – might or might not make sense on your network. It can be a package repository, compiling/downloading the necessary updates/packages for all the OSs on the network and a configuration repository, keeping all the conf files for all the servers in one place. One server crashes, just reinstall and instruct it to pull the conf files from the repo. As long as they are all kept in sync. This means that those other devices have access to this server though. Or there might be other conflicting requisites for the different devices, so it might or might not work.
Tools
There are many ways to achieve the goal and many too programs that can be used, some are free, some cost a lot of money, some are easier to set up than others.
Rsyslog – an advanced syslog server, free for commercial use, commercial support also available. It can do log filtering on virtually anything, it can log to a database, to a text file, to different files based on host/time/service and a lot more, it can listen for UDP or TCP connections, it can encrypt traffic. It’s fast and backward compatible with the standard syslog. It’s got a good community and active development. The other one is syslog-ng, though I couldn’t say which one is better. Rsyslog seems to have a slight edge.
Nagios – originally NetSaint, it’s a network monitoring application. Free, open source, highly configurable, it can be as lightweight as you want it to be. A commercial version is also available. Uses plugins, has a big community. Whatever you want to monitor chances are someone wrote a plugin for it. It can do (some) network discovery via nmap but it’s not it’s strong point. Basically it will monitor whatever you tell it to monitor whenever you tell it to and it will alert in whatever way you have available, but you have to tell it to. Online demos of the commercial and free products.
Adiscon LogAnalyzer – web interface to reading logs. Free to use, commercial support available in some form. Check the online demo to see what it can do.
Splunk – it can monitor almost any kind of log from any kind of device and generate reports/alerts or just display the data. Free for up to 500 MB a day of logs, commercial support available. Setting it up is easy and it looks like it can do a lot of things right. Commercial version can also do some more things, a quick runup of the features of each version here
Cacti – the new MRTG. Monitors devices over SNMP and draws pretty graphs of how busy they are. Free, open source, good community, as with Nagios if you want to monitor something chances are someone already did it. And even if not, it’s relatively easy to set up provided you know what you’re looking for in the SNMP tree.
Open-AudIT – inventory management. Basically it gives you some scripts to run on computers that will upload data about the hardware/software installed to the server. With a little bit of work one can automate big part of it in order to keep a closer eye on the network. Source of v2 is here and it looks like they could really use some donations.
OSSec – “Open Source Host-based Intrusion Detection System. It performs log analysis, file integrity checking, policy monitoring, rootkit detection, real-time alerting and active response”. It’s developed by TrendMicro, commercial support is available. Runs on most operating systems, including Linux, MacOS, Solaris, HP-UX, AIX and Windows and it can be set up stand-alone or as server/agent in a centralized management solution.
Snort – open source network intrusion prevention and detection system (IDS/IPS). That’s “network-based” IDS as opposed to OSSec which is a “host-based” IDS. Basically what it does is scan the packets as they move around the network and react if it detects a threat. Commercial support and signatures available, free to use otherwise, strong community.
Gnokii – CLI tool that can talk to mobile phones or GSM modems and make them do stuff. Like send SMS. Not exactly server role, but if you have a GSM modem, using nagios + gnokii you can set it up so it alerts you with an SMS if something goes wrong. As long as the modem can send SMS it doesn’t tie you up to any network. It can also be set up as an SMS gateway so that it transforms emails or other forms of text into SMS. If the network doesn’t work at all you won’t get the warning emails, but you will get the SMS, it’s really a cool thing to have.
VirtualBox – another program that doesn’t have an obvious place here, but using it can give the admin a lot of flexibility. One could set up a network between virtual machines and host, a network that would be hidden from the rest of the world. Different machines could be set up for different roles. If, say, the host is Linux but there’s a need to run a Windows-only tool a virtual machine can be used. And it would all still be on the same physical hardware, it could still have one entry point protected by one firewall and be easily backed up and restored.