Ethertype

fredag 9. oktober 2015

SQL self joins explained

In my quest to transform a database of SNMP traps into actual information, I need to master joins. Or specifically self joins. What follows is an attempt at explaining this.
I'll be using two examples. The first goes straight to the solution. The next example is more long-winded.

The TL;DR version

Consider the following table:

+------+--------+---------+------+
| id   | c1     | c2      | c3   |
+------+--------+---------+------+
|    1 | name   | letters | adam |
|    1 | weight | kg      | 80   |
|    1 | height | cm      | 180  |
|    2 | name   | letters | brad |
|    2 | weight | kg      | 70   |
|    2 | height | cm      | 170  |
+------+--------+---------+------+

One may find it natural to think that the data were entered in groups by id. But records may actually be grouped by any of the columns. From a database perspective, the table design may not appear optimal. But for illustrating self joins it works well.

How do I get from that table......to this:

+------+--------+---------+------+
| id   | name   | weight  |height|
+------+--------+---------+------+
|    1 | adam   | 80      | 180  |
|    2 | brad   | 70      | 170  |
+------+--------+---------+------+

The solution is shown below. To give you a head start understanding it:
Any column you specify in your query must be unambigous.
AS means we assign an alias to something.
The column headers in the output do not originate in the first table above, but is assigned in the query.

SELECT 
nameTable.id AS id,
nameTable.c3 AS name,
weightTable.c3 AS weight,
heightTable.c3 AS height 
FROM example AS nameTable  
JOIN example AS weightTable ON nameTable.id = weightTable.id AND weightTable.c1 = 'weight'  
JOIN example AS heightTable ON nameTable.id = heightTable.id AND heightTable.c1 = 'height' 
WHERE nameTable.c1 = 'name';

Got that? Great. You have mastered self joins and may now return to the/your original problem. Or read on.

Expanded example

The next example makes use of two tables ('beings' and 'stuff').

SELECT * FROM beings; 
+------+------------+---------+
| id   | recordtype | content |
+------+------------+---------+
|    1 | firstname  | Arnold  |
|    1 | height     | 180     |
|    1 | weight     | 80      |
|    1 | age        | 50      |
|    1 | species    | human   |
|    2 | firstname  | Camilla |
|    2 | width      | 48      |
|    2 | species    | dog     |
|    3 | firstname  | Arnold  |
|    3 | length     | 30      |
|    3 | food       | tuna    |
|    3 | species    | cat     |
|    4 | firstname  | Edmund  |
|    4 | length     | 32      |
|    4 | food       | chicken |
|    4 | species    | cat     |
+------+------------+---------+

Notice how the table beings contain a variable number of records per id. And to spice it up even further, two different ids share the same recordtype and content.

SELECT * FROM stuff; 
+------+-----------+
| id   | birthyear |
+------+-----------+
|    1 |      2001 |
|    2 |      2002 |
|    3 |      2003 |
|    4 |      2004 |
+------+-----------+

Nothing exciting about this table.

First, we try to join data from the two tables. We will join data where the column 'id' matches between the tables.

SELECT 
birthyear,
content
FROM beings 
JOIN stuff ON beings.id = stuff.id;
+-----------+---------+
| birthyear | content |
+-----------+---------+
|      2001 | Arnold  |
|      2001 | 180     |
|      2001 | 80      |
|      2001 | 50      |
|      2001 | human   |
|      2002 | Camilla |
|      2002 | 48      |
|      2002 | dog     |
|      2003 | Arnold  |
|      2003 | 30      |
|      2003 | tuna    |
|      2003 | cat     |
|      2004 | Edmund  |
|      2004 | 32      |
|      2004 | chicken |
|      2004 | cat     |
+-----------+---------+

So... we got what we asked for. Right? Notice how we have to prefix 'id' with the tablename and a period on the last line of the query.

SELECT 
birthyear,
content
FROM beings 
JOIN stuff ON beings.id = stuff.id
WHERE recordtype = 'firstname';
+-----------+---------+
| birthyear | content |
+-----------+---------+
|      2001 | Arnold  |
|      2002 | Camilla |
|      2003 | Arnold  |
|      2004 | Edmund  |
+-----------+---------+

Now we're moving in the right direction. Can we get the id in the output as well?

SELECT 
id,
birthyear,
content
FROM beings 
JOIN stuff ON beings.id = stuff.id
WHERE recordtype = 'firstname';
ERROR 1052 (23000): Column 'id' in field list is ambiguous

Not like that, anyway. Remember being unambiguous when you ask for something. As a general rule in life. And in particular when querying a database. Prefix the id with tablename and a period. It does not matter which table...

SELECT 
stuff.id,
birthyear,
content
FROM beings 
JOIN stuff ON beings.id = stuff.id
WHERE recordtype = 'firstname';
+------+-----------+---------+
| id   | birthyear | content |
+------+-----------+---------+
|    1 |      2001 | Arnold  |
|    2 |      2002 | Camilla |
|    3 |      2003 | Arnold  |
|    4 |      2004 | Edmund  |
+------+-----------+---------+

Much better!
From here on we'll skip the 'stuff' table and see if we can join table 'beings' with itself! To do that, we employ the 'AS'...verb? subject? thingy?. Whatever. We use 'AS'. Look:

SELECT 
idalias.id,
idalias.recordtype,
idalias.content,
namealias.recordtype,
namealias.content
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname';
+------+------------+---------+------------+---------+
| id   | recordtype | content | recordtype | content |
+------+------------+---------+------------+---------+
|    1 | firstname  | Arnold  | firstname  | Arnold  |
|    1 | height     | 180     | firstname  | Arnold  |
|    1 | weight     | 80      | firstname  | Arnold  |
|    1 | age        | 50      | firstname  | Arnold  |
|    1 | species    | human   | firstname  | Arnold  |
|    2 | firstname  | Camilla | firstname  | Camilla |
|    2 | width      | 48      | firstname  | Camilla |
|    2 | species    | dog     | firstname  | Camilla |
|    3 | firstname  | Arnold  | firstname  | Arnold  |
|    3 | length     | 30      | firstname  | Arnold  |
|    3 | food       | tuna    | firstname  | Arnold  |
|    3 | species    | cat     | firstname  | Arnold  |
|    4 | firstname  | Edmund  | firstname  | Edmund  |
|    4 | length     | 32      | firstname  | Edmund  |
|    4 | food       | chicken | firstname  | Edmund  |
|    4 | species    | cat     | firstname  | Edmund  |
+------+------------+---------+------------+---------+

(I believe the term may be 'clause'.)
What happened here, is that we named table 'beings' as 'idalias' and extracted the columns 'id', 'recordtype' and 'content', and then we joined each of the new records with the same records from the 'beings' table (now renamed to 'namealias'), but only where the 'id' matches in both tables, and only where the 'recordtype' of namealias contains 'firstname'. Why on earth would I do that?!
Look what happens when I add one more condition for the join:

SELECT 
idalias.id,
idalias.recordtype,
idalias.content,
namealias.recordtype,
namealias.content
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.recordtype = 'species';
+------+------------+---------+------------+---------+
| id   | recordtype | content | recordtype | content |
+------+------------+---------+------------+---------+
|    1 | species    | human   | firstname  | Arnold  |
|    2 | species    | dog     | firstname  | Camilla |
|    3 | species    | cat     | firstname  | Arnold  |
|    4 | species    | cat     | firstname  | Edmund  |
+------+------------+---------+------------+---------+

If I know I don't care for all species, but only our feline friends, I can limit my selection even ~~furrier~~further:

SELECT 
idalias.id,
idalias.recordtype,
idalias.content,
namealias.recordtype,
namealias.content
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.content = 'cat';
+------+------------+---------+------------+---------+
| id   | recordtype | content | recordtype | content |
+------+------------+---------+------------+---------+
|    3 | species    | cat     | firstname  | Arnold  |
|    4 | species    | cat     | firstname  | Edmund  |
+------+------------+---------+------------+---------+

We're getting there

SELECT 
idalias.id,
idalias.recordtype,
idalias.content,
namealias.content,
foodalias.content
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.content = 'cat'
JOIN beings AS foodalias ON foodalias.id = idalias.id AND foodalias.recordtype = 'food'
+------+------------+---------+---------+---------+
| id   | recordtype | content | content | content |
+------+------------+---------+---------+---------+
|    3 | species    | cat     | Arnold  | tuna    |
|    4 | species    | cat     | Edmund  | chicken |
+------+------------+---------+---------+---------+

Let us trim the number of columns. And give those columns more descriptive names:

SELECT 
idalias.id AS id,
namealias.content AS firstname,
foodalias.content AS food,
speciesalias.content AS species
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.content = 'cat'
JOIN beings AS foodalias ON foodalias.id = idalias.id AND foodalias.recordtype = 'food'
JOIN beings AS speciesalias ON speciesalias.id = idalias.id AND speciesalias.recordtype = 'species';
+------+-----------+---------+---------+
| id   | firstname | food    | species |
+------+-----------+---------+---------+
|    3 | Arnold    | tuna    | cat     |
|    4 | Edmund    | chicken | cat     |
+------+-----------+---------+---------+

Much better! Now, if we wanted to figure out the dietary needs of one furry friend in particular, we could add a WHERE clause. Like this:

SELECT 
idalias.id AS id,
namealias.content AS firstname,
foodalias.content AS food,
speciesalias.content AS species
FROM beings AS idalias
JOIN beings AS namealias ON namealias.id = idalias.id AND namealias.recordtype = 'firstname' AND idalias.content = 'cat'
JOIN beings AS foodalias ON foodalias.id = idalias.id AND foodalias.recordtype = 'food'
JOIN beings AS speciesalias ON speciesalias.id = idalias.id AND speciesalias.recordtype = 'species'
WHERE namealias.content = 'Arnold';
+------+-----------+------+---------+
| id   | firstname | food | species |
+------+-----------+------+---------+
|    3 | Arnold    | tuna | cat     |
+------+-----------+------+---------+

I'll let this rest for now. Hope it was of use to you!

lørdag 3. oktober 2015

Logging SNMP traps to MySQL/MariaDB.

I recently got a request formulated approximately like this:
"How can our servicedesk count the number of devices a user has on our wireless network."

The reason for the request is that we enforce a maximum of two devices per user. But whenever a user tries with a third, there is no useful feedback to the end user that the number of devices has been exceeded. Hence they will call servicedesk to request help getting their device on the network. Because they are not aware of the limit, because they have forgotten about one or another device in their backpack, or because someone is abusing the account in question. Or any other reason. Ideally, we would give users nice feedback. Practically, we don't.

Our provider of wireless network gear got tools to manage the wireless network, which may or may not assist with counting devices per user. But they are not extremely user friendly. And we'd rather not let more users have access to these tools than strictly required. Can I make a simple tool that assists superusers and the servicedesk in simple debugging of wireless access clients?

Turns out I can.

The wireless controllers already emit a huge stream of SNMP traps to another network device which makes use of them to match usernames with IPaddresses. For... reasons. I can make use of the same messages. The following is an account of how I get the messages into a database, and how I can query the database to extract the information I want.

Before we start, you should be aware of a few important things with regards to SNMP:

SNMP version 1 and 2c are not encrypted protocols. Easily eavesdropped upon, easily spoofed.
The traffic is UDP-based. An adversary can flood spoofed packets from anywhere. See your friendly firewall admin for advice.
It is common to have separate SNMP communities for read and write access to an SNMP agent. Having no write community enabled until you really need it, it is a good idea.
The SNMP community is not a password. It is more like a username, for which there is no password.
An SNMP agent is a process which can be queried (read) and possibly configured (write), as well as emit traps ('alerts') directed at an SNMP trap receiver. The agent typically runs on a piece of network equipment, like a router, switch, firewall or even a server. The SNMP agent is called snmpd in most Linux distributions.
An SNMP trap receiver accepts traps with specific communities, transmitted from SNMP agents. The trap receiver typically runs on a server. The SNMP trap receiver is called snmptrapd in most Linux distributions.
For proper security, you must employ SNMP version 3. This is not covered in this post.

This is greatly simplified. A link to suggested reading materials is provided at the end.

I got my friendly sysadmin to spin up a virtual Ubuntu machine for me. Then, I logged in and did:

user@ubuntu:~$ sudo apt-get install mariadb-server mariadb-client snmp snmpd snmp-mibs-downloader

You will be prompted for an admin user/password etc. for the database. Note that MySQL/MariaDB both use an internal user database, unrelated to the system userdatabase. Writing the admin user and password down and keeping it in a safe place is a good idea.

MariaDB is a drop-in replacement for MySQL. I choose MariaDB over MySQL, as MariaDB had support for an option not present in the version of MySQL available to me at the moment. More about this in later posts. The names of the binaries and the startup-scripts are the same for MySQL and MariaDB. Drop-in replacement, remember.

The command above adds mysql and snmpd to your default services, and starts them both. We don't need snmpd at the moment, so we will stop and disable it.

user@ubuntu:~$ sudo service snmpd stop
user@ubuntu:~$ sudo update-rc.d snmpd disable

Now, we need to create the database the traps will be stored in, as well as the database user we will use when logging traps.

user@ubuntu:~$ mysql -u root -p

When prompted, provide the password you gave during install of MariaDB. You will be greeted with this prompt:

Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 47
Server version: 5.5.44-MariaDB-1ubuntu0.14.04.1 (Ubuntu)

Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]>

Create the database we will log traps to:

MariaDB [(none)]> create database net_snmp;
Query OK, 1 row affected (0.03 sec)

Keep the database name 'net_snmp' for the purpose of logging traps.
If you end up with a ' ->' prompt, you have forgotten the terminating semicolon in the command above. Always terminate an sql command with a ';'. You can add that terminating semicolon at the ' ->' prompt. After every successful command you give the database engine, you will get a line stating Query OK, xxx row affected (0.03 sec). This is normal, and I will not repeat this line below.

Create the database user we will use to log traps with, and give that user all rights to tables in that database:

MariaDB [(none)]> create user 'netsnmp'@'localhost' identified by 'sekritpass';
MariaDB [(none)]> grant all on net_snmp.* to 'netsnmp'@'localhost' identified by 'sekritpass';

Create the schema we will enter data into.
A database schema is the actual structure we write data to, including the tables. Consider it the 'shape' and 'function' of the database. The file describing the database schema is, as far as I can tell, not installed with snmpd. But it is present in the net-snmp source code distribution.
I have reproduced it below.

Copy and paste this at the MariaDB prompt:

USE net_snmp;
DROP TABLE IF EXISTS notifications;
CREATE TABLE IF NOT EXISTS `notifications` (
`trap_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`date_time` datetime NOT NULL,
`host` varchar(255) NOT NULL,
`auth` varchar(255) NOT NULL,
`type` ENUM('get','getnext','response','set','trap','getbulk','inform','trap2','report') NOT NULL,
`version` ENUM('v1','v2c', 'unsupported(v2u)','v3') NOT NULL,
`request_id` int(11) unsigned NOT NULL,
`snmpTrapOID` varchar(1024) NOT NULL,
`transport` varchar(255) NOT NULL,
`security_model` ENUM('snmpV1','snmpV2c','USM') NOT NULL,
`v3msgid` int(11) unsigned,
`v3security_level` ENUM('noAuthNoPriv','authNoPriv','authPriv'),
`v3context_name` varchar(32),
`v3context_engine` varchar(64),
`v3security_name` varchar(32),
`v3security_engine` varchar(64),
PRIMARY KEY (`trap_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DROP TABLE IF EXISTS varbinds;
CREATE TABLE IF NOT EXISTS `varbinds` (
`trap_id` int(11) unsigned NOT NULL default '0',
`oid` varchar(1024) NOT NULL,
`type` ENUM('boolean','integer','bit','octet','null','oid','ipaddress','counter','unsigned','timeticks','opaque','unused1','counter64','unused2') NOT NULL,
`value` blob NOT NULL,
KEY `trap_id` (`trap_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

This states that we will use the net_snmp database, and creates the two tables our trap receiver expects to have access to. With the database schema in place, we are ready to configure the trap receiver.

Drop out of the MariaDB shell by typing exit at the prompt. Then create the following three files and make sure they have the same content as shown below:

root@ubuntu:~# cat /etc/snmp/snmptrapd.conf

authCommunity log mytrapcommunity

sqlMaxQueue 1

sqlSaveInterval 9

root@ubuntu:~# cat /etc/rc.local

/usr/sbin/snmptrapd -c /etc/snmp/snmptrapd.conf

exit 0

root@ubuntu:~# cat /etc/mysql/conf.d/snmptrapd.cnf

[snmptrapd]

user=netsnmp

password=sekritpass

host=localhost

The first file is the config file for the process receiving the snmp traps from your network device. The first line tells snmptrapd to log traps with the SNMP community mytrapcommunity. The next two lines instructs the trap receiver to log to mysql, have at most 1 traps in the buffer before commiting to the database, and commit to the database at least every 9 seconds. For production, you may want to increase the buffer to, say 20. Or a 100.

The second file is a dirty way of starting the SNMP trap receiver on boot. Any reader who wants to contribute a proper initfile for snmptrapd is welcome to do so.

The third file specifies the database user, password and hostname the trap receiver will use when logging to the database. You should recognise the user and password we created earlier.

And finally, we start snmptrapd:

user@ubuntu:~$ sudo sh /etc/rc.local

We should now be ready to receive SNMP traps and have them entered into the database. To send a test SNMP trap, execute the following command:

user@ubuntu:~$ sudo snmptrap -v 2c -c mytrapcommunity 127.0.0.1 "" 1.2.3.4.0

Log in to the database console again:

user@ubuntu:~$ mysql -u root -p
MariaDB [(none)]> use net_snmp;
[.....]
Database changed
MariaDB [net_snmp]> select * from varbinds;
+---------+------------------------+-----------+----------------------------------+
| trap_id | oid                    | type      | value                            |
+---------+------------------------+-----------+----------------------------------+
|       1 | .1.3.6.1.2.1.1.3.0     | timeticks | Timeticks: (8422516) 23:23:45.16 |
|       1 | .1.3.6.1.6.3.1.1.4.1.0 | oid       | OID: .1.2.3.4.0                  |
+---------+------------------------+-----------+----------------------------------+
2 rows in set (0.00 sec)

Congratulations! You have logged the first trap to the database. You will also find stuff in the table 'notifications'. You may now point your network devices at your SNMP trap receiver and store wast amounts of data in your database.

But as you may or may not know, having a huge amount of data does not automatically translate into information. I intend to write a part 2, where I give an example of how to make use of the trap database.

For further reading about SQL syntax, check out w3schools' SQL tutorial..
For further reading about SNMP, you could do worse than starting at the Net-SNMP wiki.

lørdag 17. januar 2015

WiFi terminology - by an amateur for amateurs

This is not intended as a dictionary. Read it straight through to get some mental knobs to hang stuff on. Revisit in whole or in part as needed.

2x2, 3x3, 4x4, 2rx2tx, etc
Various ways of saying how many antennas there are for transmitting and receiving signals in a MIMO setup. You will also encounter 2x2:2 or 3x3:3 or variations thereof. The ':n'-part says something about the number of spatial streams.
Specific hardware may handle fewer spatial streams than the number of antennas otherwise may indicate. In this case, the extra antenna is used for diversity. Link
802.11a/b/g/n/ac
These are the wireless standards, but you knew that already. Note that 802.11a and 11ac are valid in the 5GHz band only. 11b and 11g is 2.4GHz only. You may want to avoid clients with 11a, 11b or 11g on your network, as they spend much more time “on air” than 11n or 11ac for the same amount of data transmitted. Link. A very interesting graphic illustrating how we went from the 54Mbps of 802.11a/g to 600Mbps 802.11n can be found on page 5 of this PDF.
802.11ad
AKA WiGig. 60Mhz, max transmission rate 7Gbps. Products do exist. Not widely used yet.
802.11r/k
When a client realises that the link to the AP is gone (or going), it starts looking for another access point with the same SSID. If the client finds one, it initiates a handoff. The client and the new AP now starts what is termed a 4-way handshake.
With WPA2, WMM, 802.1x etc. added, the amount of initial traffic required to establish a connection goes up a lot, causing actual user traffic to stall. 802.11r is a protocol designed to pre-authenticate to other APs, such that the handoff is reduced to the original 4-way handshake. Thereby reducing the time associated with a handoff from one AP to another.
802.11k is related, sort of.
802.11e/WMM/WME/QoS
Provides mechanisms to give preference to some type of traffic or specific clients. Useful for time and/or jitter-sensitive data like real-time voice communication (Voice over WLAN), video streaming and battery-operated clients.
Antenna
WiFi-antennas are most often rather simple devices receiving and radiating the electromagnetic signal. They can be dual-band or single-band (2.4 and 5 Ghz). They have a gain measured in dB, which states how much a transmitted or received signal is strengthened.

A regular wifi antenna transmits a signal which best can be described as a donut shape, with the antenna pointing through the hole in the donut. Higher gain antennas transmits a flatter, wider donut than antennas with less gain. See the third grapic in this link.

Antenna gain applies to both transmit and receive.
Beamforming
Beamforming is the technology used to give the signal a boost in a particular direction. The glossy commercials present beamforming as a way to send a laser-sharp signals straight to the receiving end. The reality is more like an uneven donut. The signal gain is typically 2-3 dB, which may be enough to extend the range a bit, or enough to raise the bitrate one level at the same range.

Beamforming comes in two flavors: explicit and implicit. Explicit beamforming means that the client is aware of beamforming happing and actively contributes in the process of finding the optimal signal generated by the Access Point/Router. The client needs explicit support for this in hardware and the device-driver.

Implicit beamforming means that the client is unaware of beamforming taking place. It works regardless of support in the client hardware or software.

Beamforming is an optional feature of both 802.11n and 11ac. Not all vendors implement it, and worse is that explicit beamforming for 11n most likely requires equipment from the same vendor in both ends, as the standard does not mandate interoperability. For 11ac the standard is tighter, and interoperability should be ok for explicit beamforming, if implemented.

The cost of beamforming is slightly reduced througput, as the process requires some non-user traffic. If the beamforming results in a higher MCS index, this may make up (and some) for the reduced throughput. See these links for further reading. Link1 , Link2

The FCC appears to think that the use of beamforming requires reducing the outputpower equal to the antenna gain. If this is the case, one may wonder if beamforning actually provides any net gain at all, or if its only benefit is to reduce interference in all other directions. I have not made any effort to verify this claim.
Device drivers
Device drivers are part of the operating system kernel on the computer where the WiFi hardware is installed. In this context, the computer may be a client machine or a wireless accesspoint/router. Device drivers may be provided by the hardware manufacturer, either as a compiled binary, as closed source provided under NDA or as open source code. Or the device driver may be written by someone not associated with the manufacturer at all.

Open source code provided by the manufacturer and complying with established kernel interfaces is preferred, as it allows outsiders to compile the driver with any kernel they like, possibly adapt the driver to other operating systems, and generally tinker with the code.

Manufacturers are often wary about providing open source code for a number of reasons, most of which have nothing to do with technology at all.
DFS
Dynamic Frequency Selection ensures that we automatically avoid channels with interference, like weather radars and such. This is a mandatory requirement for equipment in the 5GHz band.
Firmware
Most WiFi hardware require firmware to operate. This is a piece of code loaded on the wifi hardware, and runs on the microprocessor on the actual wifi device (card/dongle/onboard). Firmware occasionally gets updated to fix bugs and/or add features. This is a closed source binary provided by the hardware manufacturer. The firmware ensures low-level protocol compliance, among other things.

To clear up some potential confusion: a wireless router comes with firmware. This would be the complete operating system of the wireless router, including drivers for the wireless interface, the ethernet interfaces and the webserver presenting a GUI to the end user. Then there is firmware on the wireless interface itself, most often provided by the wifi interface manufacturer to the wireless router manufacturer. This may or may not be part of the driver. The point is: firmware may mean more than one thing.
Hostapd
This is the meat of the matter. Hostapd is a user space daemon for access points and authentication servers.

When the WiFi hardware supports the low-level protocols and options you require, when the firmware of said hardware does the same, and when the kernel driver enables full use of the hardware, then hostapd is what implements an actual Access Point in the *WRTs.
HT20/HT40/VHT
...later..
Interface modes/Station roles
...later...
MIMO
Multiple Input Multiple Output. Uses multiple antennas for input and output, in order to create a better signal and increase bitrates. Link
SU-MIMO
Plain MIMO. Single-User MIMO.
MU-MIMO
As MIMO, but permitting multiple transmitters to send separate signals and multiple receivers to receive separate signals simultaneously in the same frequency band. Hence, Multi-User MIMO. Link
Transmission bitrate vs throughput
In short: the radio link spends a lot of data ensuring that the traffic the user wants to get through, gets through unmolested. There is a lot of redundancy, there is a lot of handshaking, there is a lot of waiting for the channel to be clear, and so on and so forth. The end result is that the end user never is going to get 1300Mbit throughput from an AC1900 router. Exactly how much throughput one may get depends on many factors. If you really need Gigabit throughput, a wired link (Ethernet) is better. You will get way better latency as well.
Transmission bitrates for 802.11ac here.
Transmission bitrates for 802.11n here.
Note how the actual bitrate is the result of the MCS (Modulation and Coding Scheme) index, number of parallell streams, the channel width and the guard interval between symbols. All these parameters are largely autotuned based on hardware design, signal quality, interference and so on and so forth.
Transmission Power and Receive Sensitivity
The maximum power with which signals are transmitted from a client is defined in the relevant standards. (a/b/g/n/ac). The maximum power level may differ in various geographic areas. The standards most likely defines a maximum EIRP, rather than exact power levels delivered to the antenna. EIRP is the sum of power from the amplifier and the antenna gain. As one can expect, the higher the EIRP, the longer range your signal will have.

The Receive Sensitivity says something about how weak signals you can translate into meaningful data. The various standards sets a minimum sensitivity for the equipment to be compliant. Better hardware got better sensitivity than the minimum defined in the standard. Better sensitivity translates into longer range and/or higher transmission rates.
Wave1/Wave2
The first 802.11ac specification is retroactively dubbed Wave1. Wave2 (or 802.11ac -2013 update) boosts the maximum transmission rate primarily through the use of MU-MIMO and improved beamforming. Devices certified for either wave may not necessarily implement all parts of the specification, as parts are optional. Buyer beware.
*WRT
There are a number of open source distributions for Access Points/Wireless routers. I.e. software that replaces whatever the router manufacturer put on the device. *WRT is my attempt at a common nickname for these distributions. Their lineage may be difficult to follow. This is one angle. Some are more actively developed than others. The concept of stable releases is largely left behind, it appears. The nice thing about these distributions is a common user interface, independent of brand and model. In other words: the user interface stays the same if you go with (for example) DD-WRT on your various wireless routers.

In more and more cases, individual routers do not have an «official» *WRT at all, but a version cooked by an individual developer with the right combination of hardware, interest and skill.

Companies like ASUS, NetGear and Linksys have largely embraced/copied the WRT model, but in a way that largely does not provide any benefit to the *WRTs. They cook their own WRT for their own hardware, but does not provide source for the WiFi-drivers. Hence it is impossible to cook a custom *WRT with a kernel providing other options than what the router manufacturer deemed appropriate. This may limit what functions and features the *WRTs can add to the device. The manufacturers firmware for the router largely works, as the kernel and driver has been explicitly finetuned for the exact hardware model it ships with, and with the options and features the manufacturer consider stable.

WiFi hardware and FOSS drivers on contemporary wireless routers.

Dag B., January 2015.

Prelude
I have recently done some research on what wireless router I should get get to replace the one I have. This turned out to be a “down the rabbit hole”-experience, so I thought I should share it with whoever is in the same position. I have tried to keep this readable and accurate, without covering every angle and diving into excessive details. Links are given for further reading.

A few things to note before we begin:
First: I made a terminology list here.
Second: this not going very high in the protocol stack. I focus on the actual WiFi-bit here.
Third: I consider anything not supporting 3-stream 802.11ac as legacy products. I am not interested.

And finally, I acknowledge that there are many aspects of a wireless router which can be used to define 'quality'. The wifi hardware, the amplifier, the antennas, the maturity of firmware, the maturity of the device driver, throughput, range, stability, the ability to tinker with the thing. To me, the ability to tinker weighs a lot. You may prioritize differently.

Contemporary wireless routers and type designation.
Contemporary wireless routers for the private market typically have type designations like N150, N600, N900, AC1200, AC1350, AC1600, AC1750 or AC1900. The very latest ones are designated AC2400, AC3200 and AC5200. There are likely more variants as well. Please note that routers now also appear with dual radios in the 5GHz band (MU-MIMO). Aggregate throughput goes up, but this does not necessarily imply an increase of the theoretical maximum throughput of a single individual client.

The N- prefix means that the coolest protocol it supports is 802.11n, while the AC-prefix implies support for 802.11ac. The number is the sum of the maximum transmission bitrate in the two (or three) radios in the device. This is not a throughput number a user ever will experience.

All contemporary routers are concurrent dual band routers. That is, they have two (or three) separate radios and operate in both bands at the same time. Hence, a device with support for 300Mbps in the 2.4 GHz band (802.11n) and 1300 Mbps in the 5GHz band (802.11ac), gets a designation of AC1600. Clear so far? Good.

Harsh words
Manufacturers of wireless routers need to distinguish their own kit from that of their competitors. Thus they invent fancy names like Beamfarming+, ClearAir, AIRadome, UltraLaser, MegaSuperDuper and sprinkle their marketing materials with that. All of that largely contains stuff that is described in the standards plus some tweak or secret sauce on top to justify inventing a new marketing term. Does it add any value? Possibly. You really need to test by yourself.

Speaking of tests:
When all is said and done, the performance tests performed by pundits, bloggers and journalists around the world are, in my opinion, not worth a whole lot. There are 3 gazillion little things which can impact the test results of a wireless link. They don't follow a common, agreed upon, testing procedure. The test environment is not controlled and verified. Tests are performed at different times and dates, with different clients, drivers, firmware, hardware versions, with clients from competitors, on machines with various operating systems in various grades of decay. The test results from various test suites may not be properly understood, may need interpretation or does not reflect real-world scenarios. Radio wave propagation theory and the effect of electromagnetic interference may be poorly understood.

In short: they are, by and large, not very trustworthy from the beginning. And the instant a new wifi driver/wifi-firmware/router-firmware is available, the old tests are obsolete in any case. Bam. Useless. Like Spanish Pesetas.

All of this also applies to any future performance test I happen to publish.

Notes on current wifi hardware and drivers for home routers and open source drivers.
A fairly up-to-date list of current wifi chipsets for 802.11ac can be found here, and yet another list is here.
A list of open source wifi drivers for linux can be found here and yet another list is here.

Broadcom
At the time of writing, Broadcom hardware is present in many routers at the higher end of the market. Typically the bcm4360 chipset these days. Sadly, the situation with regards to drivers for Broadcom is a mess. There are at least 3 or 4 different drivers for broadcom wifi hardware. There is a bucket-load of different chipsets. Some chipsets are partially supported by multiple drivers, some products are supported by no drivers at all, and the Broadcom chipsets present in many current high-end routers do not come with open source drivers at all. None of the open source drivers for Broadcom hardware enables 802.11ac, as far as I can tell. The bcm43602 may be an exception.

Googling for broadom and linux very quickly looses its appeal. I have no idea if any of the drivers are feature complete for any hardware, but that really applies across the board. (Not just Broadcom.)

This means that tinkerers have limited options when it comes to building custom kernels for many Broadcom-equipped routers. They must largely use the exact kernel sources provided by router manufacturers and a binary driver provided by Broadcom to router manufacturers. The binary driver does not conform to acknowledged interfaces in the linux kernel, so enabling options in the kernel other than the ones Broadcom found interesting may be hit or miss.

Broadcom hardware generally supports both explicit and implicit beamforming.

A special note on Broadcom: their 600 Mbps transmission rate in the 2.4GHz band uses a non-standard MCS, and is compatible only with client hardware from Broadcom. Client hardware from Broadcom is very common, but for other clients there is no difference in throughput if the router is designated AC1750 or AC1900.

Modulo other improvements or features of the AC1900 routers, of course.

Qualcomm/Atheros
Atheros, now a subsidiary of Qualcomm, also pushes wifi solutions for router manufacturers. They have long been a supporter of linux and open source. The open source drivers are developed in the open, and updated firmware appears regularly. Current home routers with Atheros almost all(?) use the qca9880 rev.2 chipset (3x3:3) and the ath10k driver. Beamforming is not supported.

The next generation chipset is named qca9990. Driver situation is unknown. Note that the first revision of qca9880 is not supported by ath10k.

A list of devices with the 9880 can be generated from the rather excellent site wikidevi.com. Check this, and scroll down a little bit.

Marvell
The Marvell 88W8864 is a 4x4:3 802.11ac solution for routers. It supports beamforming. A first cut of an open source linux kernel driver was revealed at the end of 2014. It may take some time to stabilize and end up in the upstream kernel. One may hope there are sufficient resources allocated to the task.

Wikidevi list of devices with this chipset.

Quantenna
At time of writing, the only vendor with a 4-stream capable chipset shipping in an end-user product. Open source drivers appears to be MIA. They have multiple chipsets. See this link, and look in the ESystems column to see available products.

As far as I can tell, these are the 4 companies with actual shipping products, used in actual routers supporting 802.11ac with 3 or more streams.

What router to get?
If you have no plan whatsoever to fiddle with the original software of the router, it does not really matter. One may assume, that the router manufacturer ships something that works. Or will provide software that eventually fixes the most annoying bugs. Those of us who have been burned by a-ok hardware with absolutely crap software which the product manufacturer quickly abandoned, will think again.

A mature open source driver, developed in the open, in cooperation with those in charge of the linux kernel, makes for a much higher likelyhood of something that can be tinkered with.

With those requirements, there appears to be no alternative to Qualcomm/Atheros-equipped hardware at this moment in time. Marvell may end up as a worthy competitor, but only time will tell. Some of the very latest routers in the segment are equipped with Broadcom bcm43062 wifi hardware, which supposedly is supported in the vanilla linux kernel. (3.19+.) I do not know if that driver makes for a useable access point mode, or if a closed source alternate driver is required for this.

This blogpost may provoke someone to enlighten me. I am getting a qca9880-based router in a day or two. Will see if I can make a writeup on that too. Eventually.

Hopefully, the driver isn't all crap. And remember to take any performance number I quote with a bucketload of salt.