Transforming XML data using the F# XML type provider
10 Jul 2016
At some point you’ll find yourself doing ETL. Whether it’s loading a few GB of data into SQL Server or consuming tiny MQ messages, one step is to parse (and potentially validate) data. Usually, this is rather boring and tedious work. Unless…
Consider the following, vastly simplified messages:
<?xml version="1.0" encoding="UTF-8" ?>
<message xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Available>Foo</Available>
<Nil xsi:nil="true"/>
<Missing>Bar</Missing>
<Missing_Nil>1337</Missing_Nil>
</message><?xml version="1.0" encoding="UTF-8" ?>
<message>
<Available>Bar</Available>
<Nil>3.141</Nil>
</message>A very convenient way of working with (unknown to the programmer) data in F# are so-called type providers.
An F# type provider is a component that provides types, properties, and methods for use in your program.
The XMLProvider is configurable with a list of messages:
type XmlMessage = XmlProvider<"messageSamples.xml", SampleIsList=true, Global=true><?xml version="1.0" encoding="UTF-8" ?>
<messages xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<message>
<Available>Foo</Available>
<Nil xsi:nil="true"/>
<Missing>Bar</Missing>
<Missing_Nil>1337</Missing_Nil>
</message>
<message>
<Available>Baz</Available>
<Nil>3.141</Nil>
</message>
<message>
<Available></Available>
<Nil>2.718</Nil>
<Missing_Nil xsi:nil="true"/>
</message>
</messages>The generated type is then used to load a file containing only one message element:
let message = XmlMessage.Load("message.xml")Extraction, fast and easy
The type provider generates types equivalent to the following ones:
type Message = {
Available : string
Missing : string option
MissingNil : MissingNil option
Nil : Nil
}
and MissingNil = {
Nil : bool option
Value : int
}
and Nil = {
Nil : bool option
Value : decimal
}For loading however, data should be in a format suitable for sending to a database:
type DbMessage = {
Available : string
Missing : string
MissingNil : Nullable<int>
Nil : Nullable<decimal>
}Again, this type is generated by a type provider: this time SQLProvider.
Our first approach for mapping from XML to DB types looked like this:
let map (xmlMessage : XmlMessage.Message) =
let missingNil =
xmlMessage.MissingNil
|> Option.bind (fun n -> match n.Nil with Some true -> None | _ -> Some n.Value)
|> Option.toNullable
let nil =
match xmlMessage.Nil.Nil with Some true -> None | _ -> Some xmlMessage.Nil.Value
|> Option.toNullable
{
Available = xmlMessage.Available
Missing = xmlMessage.Missing |> Option.toObj
MissingNil = missingNil
Nil = nil
}From this small example it might not be obvious: MissingNil.Nil and Nil.Nil are two different types.
So just extracting the matches won’t get us much further.
For sure you are now really concerned about the code duplication for extracting nillable, potentially optional values.
Just like I was. Fortunately there is…
The missing piece is a transformation from nillable to nullable:
type Nillable<'T> = {
Nil : bool option
Value : 'T
}Nullable<'T>This requires subsuming all Nil types in a generic Nillable<'T>. Per se, this cannot be done in F# as there are no partial classes.
In a dynamic language we would probably just assign Value.
In a typed mainstream language (such as C#) we might make Nil implement a generic interface (through partial classes) or use reflection.
While being safer, implementing the interface is more work.
Also, the knowledge of how to uniformly treat different types should arguably not be embedded within these types.
As there are Nillable<'T> options as well and we do know how to transform Options and Nullables,
trying to come up with a function Nillable<'T> -> 'T option feels natural.
Also, I wanted to stick with options for validation (out of scope) as long as possible.
In F# (and other languages supporting structural typing) there’s a thing called static type constraints and specifically member constraints. This allows to constrain a parameter to all types having certain members:
let inline optionOfNillable n =
match (^N : (member Nil : bool option) n) with
| Some true -> None
| _ -> Some (^N : (member Value : 'T) n)The most eye-catching feature here are the two (^X : (member Y : Z) x) expressions:
^X is a so-called statically resolved type parameter.
In contrast to generic ones statically resolved type parameters get replaced at compile timemember Y : Z is the constraint on type ^X to have a member Y of type Z(member Y : Z) x is applying the member on an instance of ^Xinline keyword instructs the compiler to generate a copy of the function for every resolved type instead of restricting it to one specific resolutionExecuting the above snippet in fsi yields:
val inline optionOfNillable :
n: ^N -> 'T option
when ^N : (member get_Nil : ^N -> bool option) and
^N : (member get_Value : ^N -> 'T)So we got a function from a parameter n of some constrained type ^N to 'T option.
^N doesn’t need to be a Nillable. Having two members: Nil returning a bool option and Value returning an arbitrary value suffices.
All our *.Nil types satisfy this condition.
To align our implementation with existing option functionality, we extend Option like this:
type Option<'T> with
static member inline ofNillable n =
match (^N : (member Nil : bool option) n) with
| Some true -> None
| _ -> Some (^N : (member Value : 'T) n)and create a new Nillable type:
type Nillable =
static member inline toNullable n =
Option.ofNillable n |> Option.toNullable
static member inline toNullable n =
Option.bind Option.ofNillable n |> Option.toNullableAs stated before, this is just for demo purposes. In our production code we stick to options a bit longer and change to Nullables only after validation.
Let’s have a look at the type of the second Nillable.toNullable overload:
toNullable : n: ^a option -> System.Nullable<'b>
when ^a : (member get_Nil : ^a -> bool option) and
^a : (member get_Value : ^a -> 'b) and
'b : (new : unit -> 'b) and 'b : struct and
'b :> System.ValueTypeGreat! The compiler figured that out all for itself. Imagine having to type all that…
Finally, this leads to concise mapping code:
let map (xmlMessage : XmlMessage.Message) =
{
Available = xmlMessage.Available
Missing = xmlMessage.Missing |> Option.toObj
MissingNil = xmlMessage.MissingNil |> Nillable.toNullable
Nil = xmlMessage.Nil |> Nillable.toNullable
}Transform, less fast but fun
Wait, already? What about the Load in ETL? Well, having the transformed DbMessage this is almost boring.
We’ve seen an application of structural typing that turns the lack of F# partial classes into competitive advantage. And we’ve seen that type inference especially shines with statically resolved type parameters and member constraints.