15 April 2015

Nancy with F#, some helpers for routes

Nancy is such a great framework for making HTTP services on the .NET platform, especially if you are using C#. Nancy depends on a lot of the C# trappings like dynamic variables, inheritance, and so forth. However, it's quite unpalatable when used from F#. There isn't a dynamic type in F#. The common alternative is defining my own ? operator to access members parameters. Also defining routes is just plain weird from F#, and only gets weirder when you try to do asynchronous routes. The translation from an async C# lambda is not a smooth one. There's also having to box everything because F# will not do automatic upcasting to System.Object.

I think this is why it seems like there are a good handful of Nancy frameworks for F#. Most of them do more than just provide more sugary syntax for defining routes, however, which is all I really wanted for the time being.

So here are my helper functions for making routes, especially async routes, a bit more palatable from F#.

    let private addRouteSync path f (router:NancyModule.RouteBuilder) =
        router.[path] <-
            fun (parameters:obj) ->
                f (parameters :?> DynamicDictionary) |> box

    // async is the default
    let private addRoute path f (router:NancyModule.RouteBuilder) =
        router.[path, true] <-
            fun (parameters:obj) cancellationToken ->
                async { // unwrap and box the result
                    let! result = f (parameters :?> DynamicDictionary) cancellationToken
                    return box result
                } |> Async.StartAsTask

    // more f# friendly methods
    type NancyModule with
        member me.post path f = me.Post |> addRoute path (f me)
        member me.get path f = me.Get |> addRoute path (f me)
        member me.put path f = me.Put |> addRoute path (f me)
        member me.delete path f = me.Delete |> addRoute path (f me)
        member me.patch path f = me.Patch |> addRoute path (f me)
        member me.options path f = me.Options |> addRoute path (f me)

        member me.postSync path f = me.Post |> addRouteSync path (f me)
        member me.getSync path f = me.Get |> addRouteSync path (f me)
        member me.putSync path f = me.Put |> addRouteSync path (f me)
        member me.deleteSync path f = me.Delete |> addRouteSync path (f me)
        member me.patchSync path f = me.Patch |> addRouteSync path (f me)
        member me.optionsSync path f = me.Options |> addRouteSync path (f me)

Here is a dumb example of usage, asynchronous:

        m.put "/orders/{id:string}"
            <| fun nmodule parameters cancelToken ->
                let id = parameters.["id"].ToString()
                async { return id } // echo id

Now the function you setup receives the parameters as a DynamicDictionary instead of obj. You can also return whatever you want (e.g. Response), and these helpers will box it for you before providing it back to Nancy. Your function can also directly return an async, and these helpers will convert it to a task type which Nancy expects. I'm also passing in the NancyModule in case your code hangs off an F# module (essentially static code) instead of the Nancy module class itself.

I am basically only use the NancyModule as an entry point (like a static main void) and try to remain functionally-styled with my real code.

14 April 2015

Making functions thread safe with agents

The F# MailboxProcessor class is pretty awesome. It allows you to safely run code that would ordinarily be vulnerable to concurrency problems, such as those that use a non-thread-safe collection (like my previously defined CircularDictionary) without worrying about locking.

However, it's a bit of a mess of boilerplate to setup. It also doesn't preserve the nice functional semantics of F#, since you have to use an object. It's also a bit wonky to call, what with the reply channel and all.

But I wouldn't mention it without also having a solution. Here is a little module I threw together that takes a presumably non-thread-safe function, wraps it in an agent to make it thread safe, and returns an identical function which will route the call through the agent.

module Agent =
    /// Wrap the given asynchronous 1 parameter function
    /// in a MailboxProcessor and provide a method to call it.
    let wrap fAsync =
        // create and start a new mailbox processor
        let agent =
            MailboxProcessor.Start
            <| fun inbox -> // inbox is the same as the agent itself
                let rec loop () = async { // define the async message processing loop
                    let! (message, reply) = inbox.Receive() // wait on next message
                    let! result = fAsync message // run async fn
                    reply(result) // reply
                    return! loop () // continue, tail call recursion
                }
                loop () // start the message processing loop
        // create a fn that appropriately posts messages to agent
        let fn x =
            agent.PostAndAsyncReply(fun chan -> (x, chan.Reply))
        fn // return fn

    /// Wrap the given asynchronous 2 parameter function
    /// in a MailboxProcessor and provide a method to call it.
    let wrap2 fAsync2 =
        // wrap two params into 1 tuple
        let fn1 = wrap <| fun (a,b) -> fAsync2 a b
        // convert 2 args to 1 tuple
        fun a b -> fn1 (a,b)

I am defaulting to calling functions that return asynchronous results. That can easily be changed by changing the let! to a let (without the bang). This doesn't really provide a way to stop the agent, but in my case, I don't care if messages are lost on shutdown. It's not hard to make a version that is stoppable if you care about that.

Here's how I'm calling it:

module IdGenerator =

    let Create (esConnection:IEventStoreConnection) idHistorySize =

        // changes to history are not thread safe
        let idHistory = new CircularDictionary<TrackingId, CreatedId>(idHistorySize)

        let generate trackingId prefix =
            ...

        // wrap generate function in an agent, thread-safe for great justice
        Agent.wrap2 generate

Oh yes, and since agents are asynchronous, the result of the wrapped function is asynchronous. At worst, you can call Async.RunSynchronously to wait on message to finish and get the result.

Some links on F# Async:
http://fsharpforfunandprofit.com/posts/concurrency-async-and-parallel/
https://msdn.microsoft.com/en-us/library/dd233250.aspx

12 April 2015

Weak Serialization

When I started implementing JSON-based messaging for the first time, my first step was to make a monolithic object for each use case.

public class UseCase1 // command
{
    public Guid MessageId {get; set;}
    public string TargetId {get; set;}
    public int Version {get; set;}
    public string UseCaseData1 {get; set;}
    public int UseCaseData2 {get; set;}
}

This is actually not a bad way to do it. However, I eventually became annoyed at having the same boilerplate metadata (MessageId, TargetId, Version, for instance) in every message. In grand “I love compiled objects” fashion, I decided it was possible to encapsulate any given message into one segmented contract. Something like this:

public class CommandMessage
{
    public Metadata Meta {get; set;}
    public ICommand Data {get; set;}
}
public class Metadata
{
    public Guid MessageId {get; set;}
    public string TargetId {get; set;}
    public int Version {get; set;}
}
public interface ICommand { }
public class UseCase1 : ICommand
{
    public string SomeData {get; set;}
}
I have the data and metadata separated. However, this doesn’t quite sit right. Message construction on the client now deals with 2 objects. Ok, not a big deal... You also have to make sure that you can deserialize to the proper type. This involves a type hint as part of the data the client sends. No problem I guess…

But eventually it hit me that my compiled object obsession was missing one of the biggest advantages of the JSON format. It’s something I first heard on the DDD/CQRS groups called weak serialization. And once you see it, it’s so obvious. So let me boldly state the obvious with code.

public class Metadata
{
    public Guid MessageId {get; set;}
    public string TargetId {get; set;}
    public int Version {get; set;}
}
public class UseCase1
{
    public string UseCaseData1 {get; set;}
    public int UseCaseData2 {get; set;}
}
JSON sent from client
{
    MessageId: "...",
    TargetId: "asdf-1",
    Version: 0,
    UseCaseData1: "asdf",
    UseCaseData2: 7
}
And the data turned into server objects
var meta = JsonConvert
    .Deserialize<MetaData>(requestBodyString);
var command = JsonConvert .Deserialize<UseCase1>(requestBodyString);
Yes, duh.

Lesson: don’t treat the JSON message as an “object”. Treat it as a data container (dictionary) which could represent multiple objects. This also frees me to add arbitrary meta information to any message without affecting other parts of the system… naming conflicts not withstanding.

09 April 2015

Circular Dictionary, where are you?

It is surprisingly hard to find a .NET implementation of a "circular dictionary".

(At least not in the 15 minutes that I looked for one)

In my definition, it's a dictionary for key-based lookups that won't exceed a set capacity. It does this by removing oldest entries to make room for new ones. My use case for this to keep a limited history of the last ??? IDs which have been issued. That way, if a client has a transient network failure, they can just retry with the same request ID, and I will provide the same answer back. I could even save the history periodically so it could be loaded on startup if the failure was server-side. However, the capacity limit is needed so that memory usage doesn't grow unbounded over time.

I originally looked at using an OrderedDictionary for this purpose, but it's implementation is such that a Remove operation is O(n), because it copies all the elements of the array down when one is removed.

The way around that is to use a circular buffer instead of an array that is always strictly ordered. In other words, when the array fills up, it just loops back around to the beginning and starts replacing existing entries.

However, it took me a surprising amount of thought to come up with a solution. And after all that mind churn, the implementation isn't even complicated. You will have to forgive the very OO-centric F#, though.

namespace Xe.DataStructures

open System.Collections.Generic

type CircularDictionary<'key, 'value when 'key : equality>(capacity: int) =
    inherit Dictionary<'key, 'value>(capacity)

    let maxIndex = capacity - 1
    let buffer = Array.zeroCreate<'key> capacity
    let mutable isFull : bool = false
    let mutable next : int = 0

    member me.CircularAdd(key:'key, value:'value) =
        let put () =
            me.Add(key, value)
            buffer.[next] <- key
        
        let moveNext () =
            if next = maxIndex then
                isFull <- true
                next <- 0
            else
                next <- next + 1

        let clear () =
            me.Remove(buffer.[next])
            |> ignore

        if isFull then clear()
        put()
        moveNext()

The unit tests for this are stupid -- too long and boring to post. It's akin to calling CircularAdd, then verifying the count, verifying the last X items are still in the dictionary, and verifying that ones added before that are not.

Obviously things: Not thread safe. Call CircularAdd after checking that the key doesn't already exist. Initialize with positive capacity. At least my use case obviates all these things, so I didn't bother with them. Feel free to use, but I'm not responsible for it breaking when you use it. :)

08 April 2015

F# dependency wrangling, interop, nancy

Defining dependencies, C# vs F#

One of the advantages of using F# is the ability to define things as functions rather than extra types. For instance, let's say I wanted to define a dependency (I like the term resource) that generates an ID.

In C#

I would likely do something like this:
 public interface IGenerateIds  
 {  
   string GenerateId(string entityPrefix);  
 }  
Then I would have to make different concrete implementations for testing, and also a production one:
 public class TestGeneratorOf1 : IGenerateIds  
 {  
   public string GenerateId(string entityPrefix)  
     { return "1";}  
 }  
 // you are testing this right?  
 public class TestGeneratorFails : IGenerateIds  
 {  
   public string GenerateId(string entityPrefix)  
   { throw new Exception("died"); }  
 }  
 public class RealGenerator : IGenerateIds  
 {  
   public string GenerateId(string entityPrefix)  
   { /* real implementation */ }  
 }  

In F#

I can directly define function signatures (called a type abbreviation):
 type GenerateId = string -> string  
This says given a string, return a string. This is a bit obtuse, so I can actually use aliases to give my signature more meaning when I look at it later:
 // aliases  
 type EntityPrefix = string  
 type CreatedId = string  
 // same signature, string -> string, but more descriptive  
 type GenerateId = EntityPrefix -> CreatedId  
This works a treat, and later I can easily define functions that implement this signature for whatever I need to test:
 let generateId1 prefix = "1"  
 let generateFail prefix : string = failwith "died"  
 let realGenerator prefix =  
   // real implementation  
The first two functions are so simple, they don't even need their own class file, and I will likely create them inline with the tests. The dependency signatures are small enough that they can all go in one centralized file and still be easily understood.

Compare the cognitive and organizational burden of the number of files. C# has at least 4 files for every dependency: interface, test pass, test fail, real. F# really only needs 1 file per dependency for the real implementation.

In my estimation, the F# version has a lot better signal-to-noise than the traditional OO dependency management!


That's great and all, but then there's interop...


So, I went to put this into practice... creating a REST endpoint using Nancy. Now Nancy is a brilliant framework, but it is quite OO-centric. Modules require inheriting a base class. DI requires interfaces and concrete classes. Fortunately, F# has these bits in it. You pretty much have to use the inheritance to use Nancy. But I wanted to continue to use functional dependencies. I also wanted to be able to unit test the endpoint with its dependencies in various conditions (good and bad) without using the real dependencies (i.e. database calls). But then have the option to deploy the same service with real dependencies and no change to the endpoint itself.

In Nancy, the way to do that is with a bootstrapper. I created bootstrappers for different dependency combinations that I wanted to test. These put the appropriate dependency in every request context.
 type IdGenGoodBootstrapper() =  
   inherit DefaultNancyBootstrapper()  
   override this.RequestStartup(container, pipelines, context) =  
      context.Items.["somekey"] <- box generateId1  

 type IdGenBadBootstrapper() =  
   inherit DefaultNancyBootstrapper()  
   override this.RequestStartup(container, pipelines, context) =  
      context.Items.["somekey"] <- box generateFail  
And I can setup my Nancy tests (nuget Nancy.Testing) to use those bootstrappers.
 let browser = new Browser(new IdGenGoodBootstrapper())  
Then my endpoint can pull the dependency function out of the request context...
 let gen = unbox<GenerateId> x.Context.Items.["somekey"]  
 // KABOOM  

... except that it crashes

It appears that the way F# works, these function signatures only work for the IDE. But at run time they compile to some core F# types (e.g. FSharpFunc). Not only that, but it appeared to me that they could be more generically typed than the defined abbreviation: e.g. a FSharpFunc<String, String> could be replaced with FSharpFunc<object, String> if you don't actually do something String-specific in the function. Combine that with the fact that at run-time, you can't get the real type definition of an abbreviated type (e.g. typedefof<GenerateId> will be a single FSharpFunc with no type parameters given). As a result, I found no way to pull a pure function back out of the dictionary and use it.

Coming from C#, this was completely unexpected to me. And I suppose it speaks to the very OO-ingrained nature of the CLR. However, just before posting a well-developed Stack Overflow question on the subject, I discovered a work-around.

Taking advantage of the OO nature, I can wrap the function signature in a record type (which ultimately gets emitted as a class, I think), and the run time can apparently cast that back to something that works.
 type GenerateId = { GenerateId: EntityPrefix -> CreatedId; }  

Then the code has to be changed to wrap the function in a record.

In the bootstrappers
 type IdGenGoodBootstrapper() =  
   inherit DefaultNancyBootstrapper()  
   override this.RequestStartup(container, pipelines, context) =  
      context.Items.["somekey"] <- box { GenerateId = generateId1 }

 type IdGenBadBootstrapper() =  
   inherit DefaultNancyBootstrapper()  
   override this.RequestStartup(container, pipelines, context) =  
      context.Items.["somekey"] <- box { GenerateId = generateFail }  
In the endpoint
 let gen = unbox<GenerateId> x.Context.Items.["somekey"]  
 let result = gen.GenerateId "asdf"  

And now it works!

Discovering this was quite the process. It was a bit of a letdown that function signatures are so limited at runtime, but the work-around is quite minimal. Overall, I am very happy at the way this works.

02 April 2015

ID Generation Strategies

The theories

Historically, common practice is to let the database decide the next ID for a new entity. The database is THE place where consistency was maintained and therefore the best source of truth for what's available next.

Things can go wonky from there when you get into systems with eventual consistency between writes and reads. As a result, I have been investigating other ID generation strategies.

One that has become popular is to use UUIDs for entity IDs. Clients can generate UUIDs at will and they are reasonably guaranteed to be unique. Initially, I thought I would switch to this, but further research brought up 2 problems. One, various platforms have differing capabilities at generating UUIDs, especially looking at HTML5. More importantly, GUIDs are hard to remember, to communicate, to type. Even if they aren't exposed to the users, *I* still have to deal with them when debugging, querying, etc.

So the next idea was to use an ID issuance service, where the client first requests a new ID. This can be retried as necessary until one is obtained. Once obtained, the client makes requests against the ID. The downside here is the risk of unused IDs due to transient failure. I could also imagine misbehaving retries creating swathes of unused IDs.

Then you can diverge into ID request tracking -- clients send a UUID, then poll another resource by UUID to watch for the ID to be generated... or at the very least the UUID can enable retries on creating the same ID. Or you can use a HiLo type algorithm so the client itself reserves a block of IDs ahead of time that it can use until it runs out. Again, with non-sequential IDs (across different clients) and (blocks of) unused IDs as minor drawbacks.

An Implementation

My first attempt at ID generation used the conventional model of creating an ID when an entity is created, based on the next available ID from the database. Then returning that ID to the caller (e.g. a Location header for REST-oriented folks). The client is also sending me a GUID (or UUID) with the request so I can trap retries (... or can I?). Ultimately the next important decision point was:

What do I do if a client sends me the same Request UUID more than once? Then to my horror, the answer is predictably: "That depends." If the client initially failed to receive a response and is retrying the entity creation, (and assuming the server can query generated IDs by UUID), then the correct response would be give them back the same previously generated ID for that UUID. However, if it's months later and due to bad pseudo-RNG the same UUID is generated for a fresh request, then the correct response would be to error. Or perhaps better, I could still return the previously generated ID and depend on the POST itself to fail because the entity had already been created. This is essentially a concurrency error where I expected entity version -1 (not created), but actual version was >= 0.

Implementing ID generation tracking in order to allow at most once creation semantics has its own challenges. Tracking means storing the request ID/entity ID and then either querying or caching this store. Querying can be a problem depending on your ID store. I'm using EventStore so as to avoid adding another database into the mix, and querying each time is essentially like a table scan -- not ideal. Caching is a brand of musical chairs that can work, but can get complicated if "done right". The sticking point on caching for me is expiring least-recently-used entity ID sets (each entity has its own incrementing set of IDs) so that memory usage doesn't grow unbounded. Maybe in 5 years every RequestID/EntityID fits in memory just fine, but maybe it doesn't! Cache miss loading performance is also likely going to be an issue over time.

Current Conclusions

Should I try to guarantee at-most-once creation? Well the record-keeping requirements on the back-end certainly makes me question that!

For a separate ID issuance service, is issuing an ID that never gets used really so bad? (Let's say due to a transient network failure and retry) Having gaps in ID issuance appears to be psychologically damaging to certain personality types, and it *is* good to be conscientious about things "falling through the cracks". What can we do to sate that? I suppose a report could suffice to satisfy that curiosity. Maybe even a follow-up procedure to officially tombstone issued but unused IDs after a certain time period if it's important that every one be accounted for.

That actually wouldn't be procedurally much different from allowing a client to double-post the same customer, causing two different customer IDs with the same data, and then having to administratively go back and delete one. Although the issued-but-unused-ID method leaves your data in potentially better shape in the interim.

For the ID issuance service, there is still an itch that I want to scratch: misbehaving clients (let's say a bug) requesting lots of IDs. You could just say "Who cares if 10 million IDs were issued but unused before we fixed the problem?" and go on about your day. But in principle, I'm unlikely to say that. And I haven't thought through options to determine a good remedy. A throttle is the immediate answer to mind, but that's terribly boring and makes me think that I really need to look at the problem differently to illuminate better options.

Oh, and by the way...

And finally, I want to say that exploring ID generation has brought up a very important shift in the line of thinking about IDs -- that is, an ID is metadata about an object rather than part of its content. When you look at most existing code, you see the entity ID being part of the entity itself. But ID issuance begs to differ because the ID must be known ahead of time before the entity can even be created. The client-generated GUID (which is also ID issuance, but from the client instead of server) also takes this tact. In fact, the SQL database itself also takes this tact in that it has to know what the next available ID is before it inserts the data. But our code has been so tightly integrated with SQL implementation details up to now that it was taken as a given that ID needs to be part of the data itself. When really, that is just what SQL requires - ID being part of the data row.

Functionally, the data itself doesn't usually give a flying rip about its own ID number when actually doing work. The important part is that the infrastructure knows about identity and can appropriately provision work and load data when given an ID. NOTE: In the relational DB world, an entity may care about the ID number of a *separate* entity insofar as it needs to ask the infrastructure to load that other entity's data.

Anyway, this realization has affected the how I model entities (e.g. in DDD, no more aggregate IDs on the aggregates themselves), and so I thought it was worth mentioning.