Monitor your network using whatever you like

In a previous blog post we showed how to use a single Flock Networks Router to monitor your entire network, using the OSPF Link State Database. In this blog post we demonstrate a technique to monitor your network, using the client, application and language of your choice.

Users of the Flock Networks Routing Suite have been telling us that they like the network status information being presented in a JSON format. However it is frustrating not being able to easily get that information off the router. Ethan Banks was kind enough to live stream his first use of the Flock Networks Routing Suite. Ethan talks about wanting a remote monitoring API here. We have listened to this feedback, and Flock Networks Routing Suite version 20.0.4 now implements a REST API.

The REST API returns a JSON payload inside HTTP. Combining these two widely used standards allows the API to talk to a huge variety of clients. The REST API is by design Read-Only, you can view the network state but cannot change it. In HTTP terms this means the only HTTP method that is supported is GET.

Monitor using a Router

Routers R02, R04 and R05 are Flock Network Routers. By using one router, say R05, you can monitor the others.

To see the local state we use the existing client connection. This is a local Unix Domain Socket delivering a JSON payload.

flock@R05:~$ flockrsc ribv4 --prefix
{"ip_net":"0.0.0.0/0","origin":"Ospfv2","next_hops":[{"intf_id":2,"ip_addr":"10.0.6.189"}]}
{"ip_net":"10.0.1.0/24","origin":"Ospfv2","next_hops":[{"intf_id":2,"ip_addr":"10.0.6.189"}]}
{"ip_net":"10.0.2.0/24","origin":"Ospfv2","next_hops":[{"intf_id":2,"ip_addr":"10.0.6.189"}]}
    ...

To see the remote state we just add a --host <host-name / host-ip> option to the command. This is a remote connection using HTTP delivering a JSON payload. The JSON payload has an identical format to the payload returned by the local command.

flock@R05:~$ flockrsc ribv4 --prefix --host R04
{"ip_net":"0.0.0.0/0","origin":"Ospfv2","next_hops":[{"intf_id":2,"ip_addr":"10.0.5.225"}]}
{"ip_net":"10.0.1.0/24","origin":"Ospfv2","next_hops":[{"intf_id":2,"ip_addr":"10.0.5.225"}]}
{"ip_net":"10.0.2.0/24","origin":"Ospfv2","next_hops":[{"intf_id":2,"ip_addr":"10.0.5.225"}]}

There is no summarization in this network, so we expect all routers to have the same routes in the RIB. We can check this using the same technique we used in the previous blog post. We store the information we expect to be identical into a text file, then compare the text files for each router.

flock@R05:~$ flockrsc ribv4 --prefix --json-pretty | grep ip_net > R05-rib.txt
flock@R05:~$ flockrsc ribv4 --prefix --json-pretty --host R04 | grep ip_net > R04-rib.txt
flock@R05:~$ diff R04-rib.txt R05-rib.txt
flock@R05:~$

Monitor using a Host

We have converted R05 from a router into a host H05. A new L3 Network N07 has been installed connected to R02.

The Flock Networks Routing Suite client is now available on its own. We can convert R05 into a host by removing the Routing Suite Daemon and installing the Client package.

flock@R05:~$ sudo systemctl stop flockrsd
flock@R05:~$ sudo dpkg --purge flockrsd
(Reading database ... 27722 files and directories currently installed.)
Removing flockrsd (20.0.4) ...
Purging configuration files for flockrsd (20.0.4) ...
flock@R05:~$ sudo dpkg -i flockrsc_20.0.4_amd64.deb
Selecting previously unselected package flockrsc.
(Reading database ... 27717 files and directories currently installed.)
Preparing to unpack flockrsc_20.0.4_amd64.deb ...
Unpacking flockrsc (20.0.4) ...
Setting up flockrsc (20.0.4) ...
flock@R05:~$

H05 is no longer running any routing code, but can still monitor the network using the REST API. Since we are now a host rather than a router, we need to add a default route via R03.

flock@H05:~$ sudo ip route add 0.0.0.0/0 via 10.0.6.189 dev enp1s0

Let’s check that the RIB’s in R02 and R04 are consistent.

flock@H05:~$ flockrsc ribv4 --prefix --json-pretty --host R02 | grep ip_net > R02-rib.txt
flock@H05:~$ flockrsc ribv4 --prefix --json-pretty --host R04 | grep ip_net > R04-rib.txt
flock@H05:~$ diff R02-rib.txt R04-rib.txt
8d7
<     "ip_net": "10.0.7.0/24",
flock@H05:~$

So R02 has the N07 subnet but R04 does not. It looks like the new network N07 might not be being advertised in OSPF. Let’s check which interfaces on R02 are enabled for OSPFv2.

flock@H05:~$ flockrsc ospfv2 --area 0 --intf --host R02
{"ospf_area_id":"0.0.0.0"}
{"name":"enp1s0","id":2,"ip":"10.0.1.249/24","state":"Dr","dr_id":"10.0.100.2","dr_ip":"10.0.1.249","bdr_id":"10.0.100.3","bdr_ip":"10.0.1.246"}
{"name":"enp7s0","id":3,"ip":"10.0.2.167/24","state":"Dr","dr_id":"10.0.100.2","dr_ip":"10.0.2.167","bdr_id":"10.0.100.3","bdr_ip":"10.0.2.213"}
{"name":"enp8s0","id":4,"ip":"10.0.3.157/24","state":"Dr","dr_id":"10.0.100.2","dr_ip":"10.0.3.157","bdr_id":"10.0.100.1","bdr_ip":"10.0.3.176"}
flock@H05:~$

The interface connecting to N07 10.0.7.0/24 is not listed. Let’s check the newly live interface is in a good state (it probably is as we have already seen a RIB entry for it).

flock@H05:~$ flockrsc system --intf --host R02 | grep 10.0.7
{"name":"enp9s0","id":5,"ip_prefixes":["10.0.7.153/24"],"state":"Up"}
flock@H05:~$

Yes it is there, it’s name is “enp9s0” and it is Up. Let’s use ssh to look at the OSPFv2 config on R02.

    flock@H05:~$ ssh flock@R02 'cat /etc/flockrsd/ospfv2.toml'
    [ospf_v2]
    router_id = "10.0.100.2"
    [[ospf_v2.area]]
    area_id = "0.0.0.0"
    [[ospf_v2.area.intf]]
    name = "enp1s0"
    [[ospf_v2.area.intf]]
    name = "enp7s0"
    [[ospf_v2.area.intf]]
    name = "enp8s0"

And yes, we are missing the entry for “enp9s0”. Let’s ssh over to R02 and correct the OSPFv2 configuration. After that we can run our original RIB consistency test again, and we should get no diffs.

flock@H05:~$ flockrsc ribv4 --prefix --json-pretty --host R02 | grep ip_net > R02-rib.txt
flock@H05:~$ flockrsc ribv4 --prefix --json-pretty --host R04 | grep ip_net > R04-rib.txt
flock@H05:~$ diff R02-rib.txt R04-rib.txt
flock@H05:~$

Monitor using whatever you like

H05 has been swapped out for any host of your choice.

The Operating System on H05 can be pretty much anything you like. You can choose which application to use to connect to the REST API. You can choose which language you want to use to process the network information.

Let’s choose curl as our application to connect to the REST API. It runs on pretty much every Operating System out there. Say we want to get the OSPFv2 Area 0 Link State Database from R04 and use it as input for a Python program. The command to get this information from the Flock Client would be:

flock@H05:~$ flockrsc ospfv2 --area 0 --lsdb --host R04 --json

Note that we have added a --json option. The output is going to be fed into Python, so we want vanilla JSON, not the Flock Client default which is JSONL (JSON with extra newlines to help human readability).

If we add the --show-url option to any Flock Client command, it will display the REST URL that it would connect to, and then exit without attempting to actually connect.

flock@H05:~$ flockrsc ospfv2 --area 0 --lsdb --host R04 --json --show-url
REST API URL would be 'http://R04:8000/ospfv2/area?area_id=0.0.0.0/lsdb/json'
flock@H05:~$

We can then tell curl to HTTP GET from this URL and pipe the output into Python.

you@your-host:~$ curl -s -X GET "http://R04:8000/ospfv2/area?area_id=0.0.0.0/lsdb/json" | python -m json.tool
[
    {
        "lsa_age": 1742,
        "lsa_checksum": "0x8906",
        "lsa_id": "10.0.100.1",
        "lsa_len": 72,
        "lsa_opts": {
            "bits": 2
        },
        "lsa_router_id": "10.0.100.1",
        "lsa_seq": "0x80000009",
        "lsa_type": "Router"
    },
...

And that’s it. You now have complete visibility into your network. You have the network information in a structured format. You can use any tooling you like to operate on that information. Put on your Dev Ops hat and go forth and create !

More information on the REST API can be found here. The Flock Routing Suite can be downloaded for free from here.

If you have any feature requests, feedback etc, please email ‘support@flocknetworks.com’.

In the meantime “Happy Coding”

Nick

Moving from C to Rust

I have spent many years systems programming in C. I have been playing around with Rust since 2015 and this year I decided to switch to primarily programming in Rust. This blog post is an attempt to explain why I have made this decision.

Although I have a background in Computer Science, at heart I’m a Developer / Network Engineer who just wants to “get stuff done” ™. I’ve played with many languages over the years and for me it was C and Python that stuck. I’m still a big fan of both of those languages.

When I first started playing with Rust, all my initial projects were checking how it felt as a systems programming language. That is, could I layout structures in memory as required ? How easy was it to call into C libraries using FFI ? And in particular how easy was it to interface with the Linux Kernel and put IP packets ‘on the wire’ ? I found Rust a good fit for this, probably because it is a C like language, so looked and felt familiar. As an aside, many developers who try Rust initially end up “fighting the borrow checker” that is built into the compiler. [In very rough terms the borrow checker makes sure that your code either has a single pointer to a mutable object, or multiple pointers to an immutable object, but never both]. If you are coming from programming in C then you can quite quickly work out why it’s complaining. However, it really makes you think hard about your data model, and who really owns each allocation of data. This can be painful if you are just trying to prototype something, but it is time well spent on anything that will end up in production.

My two compelling reasons for moving to Rust

Having decided that Rust was a capable systems programming language, I found there were 2 compelling reasons to make the move to Rust.

The first reason is that Rust is memory safe at compile time. This means you can say goodbye to the runtime memory corruption bugs you get in C. In safe Rust there is no ‘reading of’ or ‘writing to’ an invalid pointer. As I said earlier, I like C as a language, but hitting memory corruption bugs in production code is not fun. The worst part is that the point at which the program terminates, say due to a segfault to an invalid address, is likely to be thousands or millions of instructions after the code with the bug has executed. This makes finding the offending code difficult. I once spent over 3 weeks fixing a memory corruption bug in some IP Router software. That’s not something I’m proud of, but nor is it something I’m ashamed of. We could only reproduce the corruption when the device was running in production, and with all logging turned off. Fixing the bug came down to reasoning about 100,000’s of lines of C code. Eventually we came up with a theory about what was happening, wrote a ‘fix’ and the problem was never seen again. Whether we actually fixed the issue, or just changed some timing of events, I will never know. As an engineer this feels wholly inadequate.

The second reason is that Rust is data race free at compile time. As discussed above, memory corruption in single threaded code can be very hard to debug. Memory corruption in multi-threaded code is an order of magnitude more complicated. In my experience, it is possible to write reliable multi-threaded C, but you need to be very conservative and keep the threading model very simple (no bad thing), but refactoring the code is a major undertaking. In multi-threaded code there is such an explosion of event sequences that can happen, it is not possible to have exhaustive testing in place. With CPU frequencies levelling off and being replaced by multiple cores, the future of systems programming will need to be multi-threaded.

Drawbacks of moving to Rust

Rust certainly has some drawbacks when compared to C. For me the main one is the size of the executable produced by the compiler. In my experience of writing similar applications in C and in Rust, Rust release binaries tend to be around 5x larger than C release binaries. This is a combination of the fact that the Rust compiler optimises for speed not size and Rust projects suffer from dependency bloat. The Rust binary can be made smaller, and the dependency bloat problem actually comes from the advantage that you can so easily reuse other people’s libraries. But with Rust there is an extra step you need to take before releasing code, and that is to audit what is contributing to the binary size and is it a reasonable trade off? In the same way performance needs to be benchmarked between releases, with Rust binary size also needs to be tracked, to catch and investigate any large increases.

Rust also currently suffers from long compile times, and IDE support using RLS is slow and a CPU hog.

Moving a development team from C to Rust

Moving a development team from implementing in language A to language B is always going to be a hard sell if they do not already know language B. The team will have years / decades of experience in language A. Rust is a new language so it is likely a team of C programmers will not know Rust.

On top of this, Rust tends to pay back during the final stages of the development life cycle. I would guess that a team that knows both C and Rust, who started working on a new project, would be able to ship the first version of the software earlier if they wrote it in C rather than Rust. Rust tends to catch bugs earlier, at compile time rather than run time, which delays the production of the next binary. Rust forces the developer to iterate over the data model, until it is clean, before the code gets too complex, again delaying the production of the next binary. With Rust the team will probably implement Unit tests and Integration tests as they develop because all the infrastructure is in hand. With C the team may choose to produce the binary first and then possibly wrap some testing around it after. The QA teams will hit runtime panics in Rust that are hidden in C, again pushing out the date that the project ships.

So the sales pitch to management would go something like this. We have a team of highly skilled C developers. I suggest our new product “The IP packet mangler” should be written in Rust. On one hand if we write it in C we will ship it earlier and bring in revenue earlier. On the other hand Rust is shiny and new and would look good on our CV’s. Also the Rust compiler gives very pleasant error messages.

But wait, by catching bugs early, Rust gives you a huge payback in the long term. The C team have shipped a release with more bugs than the Rust team. Customers will be hitting bugs in the C product that don’t exist in the Rust product. As each team is working on the v1.1 features the C team will be interrupted more on bug fixes. This will cause frustration as most bug fixes are more important than implementing new features. The C development team will be getting more randomly timed, high level interrupts of various length, stopping them doing the ‘real work’ of implementing the new features.

Maybe the v1.1 C and Rust versions will ship around the same time. After that the Rust team will pull ahead, as the bug backlog for both teams will now cover multiple releases. The Rust team will probably ship v1.2 and all subsequent releases earlier and to a higher quality.

Obviously there is a lot of conjecture in the above argument, and it is slightly tongue in cheek. However I think the main point stands that it is preferable to pay your development costs upfront and as early as possible.

In summary

You might be able to prototype faster in C, but I believe you will reach production quality faster in Rust. Also with Rust, keeping that production quality high going forward will be easier.

And that is why I have moved from C to Rust.