Sunday, December 14, 2008

Data Migration with HApps-Data

HAppS applications, like any application with persistent data storage, are faced with the issue of migrating existing data when the format of the persistent data is changed. This tutorial will explore the binary serialization and migration facilities provided by HAppS-Data. If you think I am doing it all wrong, please let me know. Writing this tutorial is the extent of my experience using the HApps-Data migration facilities.


Requirements


This tutorial only uses the HAppS-Data (and dependencies) portion of HAppS. It has been tested with HAppS-Data 0.9.3. The first three lines of the module look like this:


> {-# LANGUAGE TemplateHaskell, UndecidableInstances, FlexibleInstances, GeneralizedNewtypeDeriving, MultiParamTypeClasses, DeriveDataTypeable, TypeFamilies #-}
> module Main where
> import HAppS.Data

Serialization


The most obvious way to serialize data in Haskell is to use the familiar Read and Show classes. Simply use show to turn a value into a String, and read to turn a String back into a value. This method has three serious flaws however:



  1. The law read . show == id does not hold for all Show/Read instances.
  2. The serialized representation is rather verbose
  3. No migration path when types change, leaving your old data inaccessible

HAppS-Data provides a Serialize class which addresses these three issues. From an end user point of view the Serialize functionality provides three items of interest:

  1. The Serialize class
  2. the serialize and deserialize functions
  3. The deriveSerialize function

> class (Typeable a, Version a) => Serialize a where 
> ...
>
> serialize :: Serialize a => a -> Lazy.ByteString
> deserialize :: Serialize a => Lazy.ByteString -> (a, Lazy.ByteString)
>
> deriveSerialize :: Language.Haskell.TH.Syntax.Name
> -> Language.Haskell.TH.Syntax.Q [Language.Haskell.TH.Syntax.Dec]

The Version superclass is used during data migration. The serialize and deserialize functions are the counterparts to show and read. deriveSerialize is a Template Haskell function which provides functionality similar to deriving (Read, Show).


The Version class


The Version class is very straight-forward. It consists of a single function which returns the Mode (aka, the version) of a datatype.


>
> class Version a where
> mode :: Mode a
> mode = Versioned 0 Nothing
>
> data Mode a = Primitive -- ^ Data layout won't change. Used for types like Int and Char.
> | Versioned (VersionId a) (Maybe (Previous a))
>
> newtype VersionId a = VersionId {unVersion :: Int} deriving (Num,Read,Show,Eq)

There are two categories of datatypes:



  • primitives whose layout will never change, and, hence, will never need to be migrated
  • everything else

The Versioned constructor takes two arguments. The first argument is a version number which you increment when you make an change to the data-type. The second argument is an indicator of the previous version of the data-type. The exact details are covered in the next section.

Putting it all together


Let's say we have the following types:


>
> $(deriveAll [''Eq,''Ord,''Read,''Show, ''Default]
> [d|
> data Foo
> = Bar String
> | Baz Beep
>
> data Beep
> = Beep
> |])

The deriveAll template haskell function is similar to the normal haskell deriving clause, except it also has the ability to derive Default instances. Additionally, it always derives Typeable and Data instances even though they are not explicitly listed.


To make the types serializeable we first need to create Version instances.


> instance Version Beep where
> mode = Versioned 0 Nothing
>
> instance Version Foo where
> mode = Versioned 0 Nothing

We want to indicate that Beep and Foo are non-primative types, so we use the Versioned constructor. Next we specify a version number for the type. It could be anything, but 0 is the most sensible choice. Since there are now previous versions of these types we mark the previous type as Nothing.

For all non-primitive types the initial version of Versioned 0 Nothing is sensible. So the Version class provides it as a default value for mode:


> class Version a where
> mode :: Mode a
> mode = Versioned 0 Nothing

Hence, we could shorten our Version instances from above to:


> instance Version Beep
> instance Version Foo

Next we derive Serialize instances for our types:


> $(deriveSerialize ''Beep)
> $(deriveSerialize ''Foo)

Now we can use serialize to serialize values. Let's look at the output of serialize Beep



*Main> Data.ByteString.Lazy.unpack $ serialize Beep
[0,0,0,0,0,0,0,0,0]
*Main>

We see that Beep serializes to 9 bytes. The first 8 bytes represent the VersionId. VersionId is basically an Int, and the serialization code always treats Ints as a 64-bit values to avoid cross-platform issues. The final byte indicates which constructor of Beep was used. In this case the zeroth constructor was used.


At first it may seem like we don't have enough information here to deserialize the data, after all there are no type names, constructors, etc. But deserializing these bytes is no different than doing read "1" :: Int. Because we know the type of the value we want to be reading at compile time, we do not need to record that information in the stored data. We just do:



*Main> deserialize (serialize Beep) :: (Beep,ByteString)
(Beep,Empty)
*Main>

As a side note, Strings are serialized to a very compact representation. In fact, they are stored as compactly as ByteStrings because they are first converted to a ByteString.



*Main> Data.ByteString.Lazy.unpack $ serialize "hello"
[0,0,0,0,0,0,0,5,104,101,108,108,111]
*Main> Data.ByteString.Lazy.unpack $ serialize (Data.ByteString.Lazy.Char8.pack "hello")
[0,0,0,0,0,0,0,5,104,101,108,108,111]
*Main>

The first 8 bytes are the length of the String, and the remaining bytes are the utf-8 encoded characters of the String.


So, if you application is best served by using Strings instead of ByteStrings, you do not have to take an extra steps to ensure that the serialized data is compactly represented.


Simple Migration


Let's say we want to add another constructor to the Beep type. As a first pass, we will actually create a whole new type named Beep', which is similar to the old type, but has an additional constructor BeepBeep.


> $(deriveAll [''Eq,''Ord,''Read,''Show, ''Default]
> [d|
> data Beep' = BeepBeep' | Beep'
> |])
>
> $(deriveSerialize ''Beep')

Because we are extending a previous type, our Version instance will look a bit different:


> instance Version Beep' where
> mode = extension 1 (Proxy :: Proxy Beep)

This indicates that we are extending the old type Beep. The new version number must be higher than the old version, but does not have to be strictly sequential.


Because we specified that this type is a newer version of an older type, we also need to tell HAppS how to migrate the old data to the new type. To do this, we simply create an instance of the Migrate class.


> class Migrate a b where
> migrate :: a -> b

The Migrate class is quite simple, it contains a single function, migrate which migrates something of type a to type b. In our current example, all we need is:


> instance Migrate Beep Beep' where
> migrate Beep = Beep'

We can demonstrate migration by serializing a value of type Beep and deserializing it as type Beep'. The migration happens automatically in the deserialize function.



*Main> fst $ deserialize (serialize Beep) :: Beep'
Beep'
*Main>

When deserialize tries to deserialize the data produced by serialize Beep, it will first check the version number. When it sees that the version number in the stored data is lower than the version number of the current type it will instead try to decode it as the type you specified as the previous version. If the version associated with the previous type is still higher than the value in the serialized data, the migration code will recurse until it finds a matching version number. Once it finds a matching version number, it will call the corresponding deserialization "instance" to decode the old data. Then as the recursion unwinds, it will apply the migrate function to migrate the data to newer and newer formats until it reaches the newest format.


Managing History


A big issue in the above example is that when we added the new constructor we also changed the name of the type and its existing constructors. That is not very convenient in a real application where you have a multitude of references to the old names.


Fortunately, we do not have to change the name of the type to add a new constructor. As we saw in the beginning, the name of the type and the names of the constructors are not actually stored in the serialized data. So, instead we can change the name of the old type from Beep to OldBeep and update its constructor as well.


> $(deriveAll [''Eq,''Ord,''Read,''Show, ''Default]
> [d|
> data OldBeep = OldBeep
> |])
>
> $(deriveSerialize ''OldBeep)
> instance Version OldBeep

Because OldBeep and Beep have the same shape, they will serialize to the same byte sequence:


*Main> Data.ByteString.Lazy.unpack $ serialize OldBeep
[0,0,0,0,0,0,0,0,0]
*Main> Data.ByteString.Lazy.unpack $ serialize Beep
[0,0,0,0,0,0,0,0,0]
*Main>

that means we can serialize an OldBeep value and then deserialize it as a Beep value, like this:



*Main> fst $ deserialize (serialize OldBeep) :: Beep
Beep
*Main>

Note that this is not the same as migration. Here we are just exploiting the fact that because the type name and constructor names are not encoded in the serialized data we can change those names and still be able to deserialize the data.


Full Migration Example #1


Here is the full example which shows:



  1. Beep renamed to OldBeep
  2. the new Beep with the extra constructor
  3. the migration code from OldBeep to Beep

> $(deriveAll [''Eq,''Ord,''Read,''Show, ''Default]
> [d|
> data OldBeep
> = OldBeep
> |])
>
> instance Version OldBeep
> $(deriveSerialize ''OldBeep)
>
>
> $(deriveAll [''Eq,''Ord,''Read,''Show, ''Default]
> [d|
> data OldBeep
> = OldBeep
> data Beep = BeepBeep | Beep
> |])
>
> instance Version Beep where
> mode = extension 1 (Proxy :: Proxy OldBeep)
>
> $(deriveSerialize ''Beep)

Using separate files to manage type history


Keeping all the revisions of your type in one file, and changing the name of the type and its constructors every revision is tedious and hard to manage. Instead, we can use a system where we rename the files that contain our types. To start, we will put the types we want to serialize in a separate file (or files), such as Types.lhs.


> {-# LANGUAGE TemplateHaskell, UndecidableInstances, FlexibleInstances, GeneralizedNewtypeDeriving, MultiParamTypeClasses, DeriveDataTypeable, TypeFamilies #-}
> module Types
> ( Bar(..)
> , Foo(..)
> ) where

> import HAppS.Data

>
> $(deriveAll [''Eq,''Ord,''Read,''Show, ''Default]
> [d|
> data Foo
> = Bar String
> | Baz Beep
>
> data Beep
> = Beep
> |])
>
> instance Version Beep
> $(deriveSerialize ''Beep)
> instance Version Foo
> $(deriveSerialize ''Foo)

Now let's say we want to add a constructor Bop to the type Foo. First we rename Types.lhs to Types_000.lhs and change the module name to reflect the changed file name:


> {-# LANGUAGE TemplateHaskell, UndecidableInstances, FlexibleInstances, GeneralizedNewtypeDeriving, MultiParamTypeClasses, DeriveDataTypeable, TypeFamilies #-}
> module Types_000
> ( Beep(..)
> , Foo(..)
> ) where

> import HAppS.Data

>
> $(deriveAll [''Eq,''Ord,''Read,''Show, ''Default]
> [d|
> data Foo
> = Bar String
> | Baz Beep
>
> data Beep
> = Beep
> |])
>
> instance Version Beep
> $(deriveSerialize ''Beep)
> instance Version Foo
> $(deriveSerialize ''Foo)

Next we create a new Types.lhs:


> {-# LANGUAGE TemplateHaskell, UndecidableInstances, FlexibleInstances, GeneralizedNewtypeDeriving, MultiParamTypeClasses, DeriveDataTypeable, TypeFamilies #-}
> module Types
> ( Beep(..) -- ^ re-exported from Types_000
> , Foo(..) -- ^ extended here
> ) where

> import HAppS.Data

We import the old Types_000 qualified as T0 to avoid name clashes.


> import qualified Types_000 as T0

Since we are only modifying Foo, we can import and re-export the old version of Beep unmodified (also see the module export list above):


> import           Types_000 (Beep)

Then we extend Foo with the new constructor Bop Int:


>
> $(deriveAll [''Eq,''Ord,''Read,''Show, ''Default]
> [d|
> data Foo
> = Bar String
> | Baz Beep
> | Bop Int
> |])

Next we create a Version instance which indicates that the previous version of Foo is T0.Foo.


>
> instance Version Foo where
> mode = extension 1 (Proxy :: Proxy T0.Foo)
>
> $(deriveSerialize ''Foo)
>

And, finally, we specify how to migrate the old data:


> instance Migrate T0.Foo Foo where
> migrate (T0.Bar str) = Bar str
> migrate (T0.Baz beep) = Baz beep

Note that Foo in Types.lhs and Foo in Types_000.lhs are different types, namely Types.Foo and Types_000.Foo. So this works for the same reason that renaming Beep to OldBeep works.


Serializing Datatypes from 3rd Party Libraries


It is also possible to serialize datatypes from 3rd party libraries, provided those types have Data and Typeable instances. There is a caveat with this however. If the third party library changes the type, then you will not be able to read your data. This is not a fatal flaw however. You can simply copy the old type definition into a local module, and then migrate the old data to the new format.


Suggested Policy



  1. Put the types you will serialize in one or more files which only contain types
  2. Deploy your web 2.718 killer app
  3. Before you do any more development, copy the current type files to sequential versions and create new current type files which re-export all the types. You can skip this step if the current type file only contains re-exports. i.e., if no type changes were made to that type file during the previous iteration.
  4. Make changes for current development cycle, and then go to step 2.

Friday, July 18, 2008

HTML Templating in HAppS using HSP

HAppS (the Haskell Application Server framework) recently got a whole lot more awesome when it added support for HTML templating using HSP (Haskell Server Pages).


HSP is an extension to Haskell which allows you to use XML syntax in your Haskell source. It is implemented as a preprocessor, trhxs, which transforms the source into regular Haskell code. Unlike some templating systems, you are not restricted to simple string substitution, or a limited templating language. Using HSP we can use the full-power of Haskell in our templates.


In this quickstart guide I will give a brief overview of how I use HSP with HAppS. I assume you already know the basics of using HAppS and so I do not cover those aspects, unless they differ when using HSP. This quickstart is very heavy on the "what to do", but a bit light on the "why" portion. The biggest contribution is the working example directory, which gets you started on the right foot.


Extracting and Running this QuickStart


This quickstart is written in Literal Haskell, and can be executed directly. But, more usefully, I have also written scripts which will automatically extract the code from this quickstart and create a nice template directory to start your projects with.


Among other things, you must be using a version of trhsx which was built against haskell-src-exts >= 0.3.5. And you should be using a version of HAppS-Server that has my support for on-the-fly output validation. However, it is trivial to comment out the validation support if you need to.


You can get a copy of this quickstart via


darcs get http://src.seereason.com/examples/happs-hsp-quickstart/


To recreate this quickstart as a .html file, just run:


make


To run the quickstart in-place do:


make test


and then point your browser at http://localhost:8000/


To load the quickstart Main.lhs into GHCi, set the option -ipages


To extract the code into a usable template run:


make template


This will create a subdirectory named template which you can copy, modify, and use for your own projects.


A Very Simple HSP Example


The following example highlights the basics of mixing XML and Haskell together. This example uses HSP, but not HAppS.


The first thing to notice is the extra options we pass to GHC, namely -F -pgmF trhsx. This tells GHC (and GHCi) to call trhsx to pre-process the source code before trying to compile it. trhsx will automatically translate the XML syntax into normal Haskell code which the compiler can understand.


> {-# OPTIONS_GHC -F -pgmF trhsx #-}
> module Main where
>
> import Data.Char (toUpper)
> import HSP
> import HSP.HTML

Next we see how to write a simple function which generates XML using the special XML syntax as well as dynamically creating some of the XML using ordinary Haskell expressions.


>
> -- |hello creates a simple hello <noun> page
> hello :: String -> HSP XML
> hello noun =
> <html>
> <head>
> <title>Hello, <% noun %></title>
> </head>
> <body>
> <p>Hello, <% map toUpper noun %></p>
> </body>
> </html>
>

To use the XML syntax, we just use it -- no special escaping is required to insert XML in the middle of a Haskell expression. On the other hand, if we want to evaluate a Haskell expression and insert it into the XML, then we need the special <% %> escape clause. If we had just written, <title> Hello, noun</title> then the title would have been Hello, noun. Likewise, if we did not have the <% %> around map toUpper noun, the page would say, "Hello map toUpper noun" instead of evaluating map toUpper noun and substituting the result in.


We can evaluate any arbitrary expression inside the <% %>, provided HSP knows how to turn the result into an XML value. By default, HSP knows how to handle String, Char, numbers, and some other common Haskell data-types. You can add additional class instances if you want to convert other datatypes (including your own) to XML automatically.


Next we have a simple main. The first line evaluates hello and gets back the generated XML. The second line uses renderAsHtml to turn the XML into HTML. renderAsHtml expects you to pass in valid XHTML which it will transform into valid HTML. However, it does not check the validity of the input or output. We will do that using a more general mechanism in the HAppS code.


>
> main :: IO ()
> main =
> do (_, xml) <- evalHSP (hello "World") Nothing
> putStrLn (renderAsHTML xml)
>

If we run this example, we get the following output:



<html
><head
><title
>Hello, World</title
></head
><body
><p
>Hello, WORLD</p
></body
></html
>

The formatting probably looks a bit funny to you -- but the web browser will read it fine. In HTML, the whitespace between the open and close tags is sometimes significant. However, the whitespace inside the open or close tag itself is never significant. The rendering algorithm is designed to exploit those properties to ensure it never adds significant whitespace where you didn't explicit have it in the input file.


Integrating HSP with HAppS


The integration of HSP with HAppS brings a few things to the table:



  • Dynamic HSP recompilation and loading - this means you can modify your HSP templates with out having to completely recompile and restart your HAppS application. Recompiling an HSP page usually happens faster than I can switch from emacs to the web browser and hit reload.
  • JSON data serialization - in order to pass data from HAppS to the HSP template, the data is serialized as a JSON object. This can be especially useful if you want to extend your application to support AJAX or if you want to expose your API to third parties using JSON, because you can use a single unified JSON API for all three instead of having the JSON API be some extra thing that is tact on.

If you decide you do not like using JSON and the dynamic page loading provided happs-hsp-template, but you do like HSP, it should be fairly straight forward to use HSP directly in your HAppS application, similar to how you would use Text.XHtml. However, this quickstart will focus on using happs-hsp-template.


Basic framework

This is the basic framework for creating a HAppS application using:



  • HAppS-Server for HTTP, cookies, etc
  • HAppS-State for persistent storage (aka, our database)
  • HSP for HTML templating
  • RJson for JSON marshalling

> module Main where
>
> import Control.Concurrent
> import Control.Monad
> import HAppS.Server
> import HAppS.State
> import HSP
> import HAppS.Template.HSP
> import HAppS.Template.HSP.Handle
> import Interface
> import State
> import System.Environment
> import Text.RJson

This main function is mostly boilerplate.


>
> main :: IO ()
> main =

newStore just creates an IORef to a Map which will map template source files to compiled object files.


>    do store <- newStore

startSystemState starts the state transaction system.


>       control <- startSystemState entryPoint

Next we parse the command-line arguments to extract a default config file. We also enable validation using wdg-html-validator. In theory, validation should be enabled and disable via a command-line argument, but I have not quite figured out how to add the desired functionality to parseConfig. If you do not have a version of HAppS-Server which support validation, then just change the last line to Right c -> c.


>       eConf <- liftM parseConfig getArgs
> let conf = case eConf of
> Left e -> error (unlines e)
> Right c -> c { validator = Just wdgHTMLValidator }

We then fork off the HTTP process in a new thread. This will take care of handling all incoming requests for us in the background.


>       tid <- forkIO $ simpleHTTP conf $ impl store

Now we are just waiting around for someone to terminate us.


>       putStrLn "running..."
> waitForTermination

Finally we cleanly shutdown the system. The system will not get corrupted if it is shutdown unexpected or improperly, however it may be possible that some pending transactions are lost. (That is true of most (or all?) database systems. The Consistency part of ACID means that the database will not get corrupt. That neccesarily means that sometimes data will be lost during an abrupt shutdown or intentionally thrown out during recovery.)


>       killThread tid
> shutdownSystem control
> destroyStore store

entryPoint is just there to provide type information to startSystemState. The type should be the one you want stored in your State. In our case that is HitCounter.


>
> entryPoint :: Proxy HitCounter
> entryPoint = Proxy

The impl function is your typical HAppS-Server impl function with a few new constructs:


>
> impl :: Store -> [ServerPart Response]
> impl store =
> [ runHSPHandle "pages" "objfiles" store $ multi
> [ dir "json"
> [ dir "times"
> [ path $ \c ->
> [ method GET $ ok (toResponse (toJson (if c == (1 :: Integer) then "time" else "times")))
> ]
> ]
> ]
> , method GET $ do hits <- webUpdate IncHits
> addParam "hits" hits
> ok =<< execTemplate Nothing "Index.hs"
> ]
> ]

runHSPHandle takes four arguments



  1. the directory which holds the templates
  2. the directory to store the compiled template object files in
  3. a handle to the store which manages the objects
  4. a ServerPart a

dir "json" provides our JSON API, which can be accessed via HSP, client-side javascript, or as a 3rd party API. We currently only provide one API call, times. It takes an integer and returns "time" if it is 1, otherwise "times". The toJson function converts the value into JSON data. Our JSON API must be inside runHSPHandle.


The addParam function adds the hits value to the environment used by the page template later. This allows us to pass information to the page template without having to add a JSON API call. We can not always use this method because we may not know what information the page template will need until we run it. More on this later.


The execTemplate function is responsible for actually invoking a specific template. It takes two arguments:



  1. default XMLMetaData (mime-type, doctype, etc), to use if the template does not provide any values.
  2. the name of the template file relative to the directory passed to runHSPHandle

JSON Interface


JSON, short for Javascript Object Notation, is a lightweight data-interchange format. It's primary appeal is that it is natively supported by Javascript. This means you can create javascript objects using JSON notation in the javascript program text, and you can easily convert between JSON and javascript objects at runtime.


happs-hsp-template uses RJson for converting Haskell values to and from JSON. See the README in the RJson tarball for more information on using RJson.


We define our datatypes which will be turned into JSON objects in pages/Interface.lhs. It is in the pages subdirectory because it needs to be imported by both our HAppS backend as well the the HSP templates. This ensures that they are both talking the same specification.


> {-# LANGUAGE TemplateHaskell, UndecidableInstances, FlexibleInstances, GeneralizedNewtypeDeriving, MultiParamTypeClasses, DeriveDataTypeable, TypeFamilies #-}

>
> module Interface where
> import HAppS.Data
> import HAppS.State

Next we define a type which will be used as JSON data:


>
> $(deriveAll [''Eq,''Ord,''Read,''Show,''Num,''Enum,''Default]
> [d|
> newtype HitCounter = HitCounter { hits :: Integer }
> |])

One key thing to note is that we use the record syntax in our type declaration. This is so that RJson can automatically convert our data-type into a JSON object. If we did not do this, we would have to manually declare instances of ToJson and FromJson for HitCounter.


It is critical that you read the RJson README to get an understanding of what it supports, or you will be mystified as to why your JSON data is getting silently corrupted.


Because we might want to store HitCounter in the HAppS State we also deriveSerialize for it, and create a Version instance. In theory this could also be useful if clients attempt to pass JSON data back to the server using an older format. However, I have no idea how to handle that in practice.


>
> $(deriveSerialize ''HitCounter)
> instance Version HitCounter

State Definition


In this quickstart our State will just be HitCounter. There is nothing HSP-specific about this module, aside from the fact that we have already defined HitCounter in Interface.lhs.


> {-# LANGUAGE TemplateHaskell, UndecidableInstances, FlexibleInstances, GeneralizedNewtypeDeriving, MultiParamTypeClasses, DeriveDataTypeable, TypeFamilies #-}
> {-# OPTIONS_GHC -F -pgmF trhsx #-}
> module State where
>
> import HAppS.Data
> import HAppS.State
> import Control.Concurrent
> import Control.Monad.State
> import Interface

>
> -- |increment hit counter, and return new value
> incHits :: Update HitCounter HitCounter
> incHits =
> do hc <- fmap succ get
> put hc
> return hc

>
> $(mkMethods ''HitCounter
> [ 'incHits
> ])

>
> instance Component HitCounter where
> type Dependencies HitCounter = End
> initialValue = HitCounter 0

Index Page


Now that we have our JSON interface and our State defined, we can implement a page template.


We will use the OPTIONS_GHC pragma to automatically pass this file through trhsx. happs-hsp-template will automatically add this command-line option when compiling pages, but adding it explicitly means we can easily load the file into GHCi.


> {-# OPTIONS_GHC -F -pgmFtrhsx #-}

> module Index where
>
> import Control.Monad.Trans
> import HSP
> import HAppS.Template.HSP
> import UACCT
> import HSP.Google.Analytics
> import HAppS.State
> import Interface

In our templates, the entry point is always page and it always has the type Web XML. (If you use HSP without HAppS then you will want HSP XML instead of Web XML). This is similar to how normal programs always start at main and main always has the type IO ().


>
> page :: Web XML
> page =
> withMetaData html4Strict $
> do (HitCounter hits) <- localRead "hits"
> times <- globalRead $ simpleRequest ("/json/times/" ++ show hits)
> page' hits times

Our page function does fours things.



  1. Sets the XML meta-data to html4Strict. This will:

    • render the page as HTML (instead of XML)
    • set the DOCTYPE to HTML 4.01 Strict.
    • set the content-type to "text/html"

  2. reads the "hits" value from the local environment.
  3. Uses the JSON API to determine if it should say, "time" or "times"
  4. Calls the page' function

The environment read by localRead is created way back in the impl function when we did:


>        addParam "hits" hits

globalRead simulates an HTTP request to the HAppS server. In this example, we just do a simple GET request to /json/times, but any type of HTTP request can be simulated.


If globalRead returns "Missing content-type", it is probably because your JSON API is not inside the runHSPHandle call.


The page' function is straight-forward HSP code. We could have put the contents in the page function itself, but I prefer to split it out, because it looks neater, and makes gives you the option of calling the page' function.


>
> page' :: Integer -> String -> Web XML
> page' hits times =
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
> <title>HitCounter 3000</title>
> <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
> <link type="text/css" rel="stylesheet" href="/css/hg.css" media="screen" title="default" />
> </head>
> <body>
> <h1>HitCounter 3000</h1>
> <p>This page has been viewed <% (show hits) ++ " " ++ times %>.</p>
> <% analytics uacct %>
> </body>
> </html>

Google Analytics Code


The UACCT module is just a place to keep our Google Analytics account code so that we can easily import it into all the pages on the site. Since you are likely to want to use a different Google Analytics code for each site, it makes more sense to keep this code in the pages directory on a per-project basis, rather than store it in a system-wide library.


> module UACCT where
>
> import HSP.Google.Analytics
>
> uacct :: UACCT
> uacct = UACCT "UA-4353757-1"

Conclusion


I hope you now have a basic idea of what is going on. The next step is to extract the template directory and attempt to make your own site.


In this quickstart, our State and our JSON interface are very closely related. In fact, they are just the HitCounter type. For a simple application such as this one, Storing your JSON objects directly in the State seems like a sensible approach. For other sites, you will find that you want to have one set of data types for your State, and a separate set of data types for your Interface, though some data types may still be shared between the two.

Thursday, July 3, 2008

Extending Asterisk with HAppS

I just glued two of my favorite technologies together, Asterisk (the opensource PBX/VoIP/etc system) and HAppS (the Haskell Application Server framework).



If you have heard of HAppS, but never used it, you may have the impression that HAppS is a web development platform -- but that is not quite correct. HAppS is actually a collection of several different server components which can be combined together or used separately.



In this post, I will show how to build as simple FastAGI server on top of the HAppS-State component. We will not be using the web component (HAppS-Server), which is the part that provides HTTP, templating, cookies, etc. This post assumes no prior knowledge of HAppS or AGI.



What You Will Need


If you want to build this demo you will need:



  • the latest version of HAppS-State and it's dependencies.
  • Asterisk (I think any version later than 1.0 should work. I use 1.4.17 from Ubuntu Hardy)
  • The haskell AGI library
  • darcs get http://src.seereason.com/fastagi-hitcounter


Run make HitCounter.hs to produce a nice, clean
.hs file from the .lhs file.



FastAGI



The asterisk server can be extended by using the Asterisk Gateway Interface (AGI). AGI provides the functionality you need to do stuff like "Please enter your 16-digit account number."



An AGI script is a standalone program you write (in a language of your choice). Asterisk communications with your AGI script by running it directly, and writing to its stdin and reading from its stdout. The AGI protocol consists of simple commands and
responses
which are human readable text.



Asterisk also has the option of communicating with your AGI script remotely via TCP instead of directly running a local program. This feature is known as FastAGI. The commands and responses are identical to normal AGI, the only differences are:



  • The communication channel is setup via TCP instead of forking off a local process
  • Some extra AGI variables are passed in by FastAGI


There is one additional importing difference, which is more of a side-effect. When using plain-old AGI, a new process will be spawned for each call. When using FastAGI, a new TCP connection will be opened -- typically to a single, long running server process. So, with AGI you will need to worry about how to provide communication and synchronization between multiple processes, but with FastAGI, you can just use threads.



HAppS-State



The HAppS State component provides in-memory state with ACID guarantees. It uses write ahead logging and checkpointing to ensure the state can be restored from disk in the event of a power outage, and also provides multimaster replication. Unlike a traditional relational database, HAppS-State works directly with native, arbitrary Haskell data types. This means you don't have to figure out how to get your beautiful data structures wedged into a relational database just to get ACID guarantees and replication.



Example Application


The remainder of the post is a simple example which implements a hit-counter. When you call the phone number, it tells you what caller number you are. I won't go into too much detail about the HAppS State portion, since this post is supposed to show how to integrate AGI, not how to use HAppS State.



> {-# LANGUAGE TemplateHaskell, UndecidableInstances, TypeFamilies, 
> TypeSynonymInstances, FlexibleInstances, DeriveDataTypeable,
> MultiParamTypeClasses, TypeOperators, GeneralizedNewtypeDeriving #-}


> module Main where
>
> import Control.Concurrent
> import Control.Monad
> import Control.Monad.Reader
> import Control.Monad.State
> import Control.Monad.Trans
> import HAppS.Data
> import HAppS.State
> import Network
> import Network.AGI
> import System.Random
> import System.Posix.Unistd

The first thing we do is define the type we will use to store our
persistent state (aka, our "database schema"). The deriveAll is
similar to deriving (Eq, Ord, Read, Show, Num, Enum, Default,
Data, Typeable)
. Since there is no way to extend
deriving, we have to use Template Haskell to add support
for deriving Default.



>
> $(deriveAll [''Eq,''Ord,''Read,''Show,''Num,''Enum,''Default]
> [d|
> newtype HitCounter = HitCounter { hits :: Integer }
> |])


deriveSerialize is part of the magic that allows HitCounter
to be serialized to disk or replicated between servers.



>
> $(deriveSerialize ''HitCounter)

The Version instance is used to migrate the
old data, if we modify the HitCounter data structure. That
is a subject for a different tutorial.



> instance Version HitCounter

Next we define a function which modifies the global state
(HitCounter). This function while be run
atomically. This means that there is no race condition between the
get and the put. The get and
put functions come from
Control.Monad.State.



>
> -- |increment hit counter, and return new value
> incHits :: Update HitCounter Integer
> incHits =
> do hc <- fmap succ get
> put hc
> return (hits hc)

This is the magic which converts the incHits function
into an atomic action for updating the global state.



>
> $(mkMethods ''HitCounter
> [ 'incHits
> ])

Next we define our top-level component which uses the global state. A more complex application might use a bunch of independent components similar to HitCounter. This allows us to easily build things like session support, user accounts, etc, in third party reusable libraries. I believe it also makes atomic actions finer grained and makes it possible to support shards.



>
> instance Component HitCounter where
> type Dependencies HitCounter = End
> initialValue = HitCounter 0

> entryPoint :: Proxy HitCounter
> entryPoint = Proxy

The main function starts up the state engine, forks off the fastAGI server, waits for a shutdown signal (for example, ^C), and then cleanly shuts down the state engine. The fastAGI function comes from the Haskell AGI library, and is in no way HAppS specific.



>
> main :: IO ()
> main =
> do control <- startSystemState entryPoint
> tid <- forkIO $ fastAGI Nothing agiMain
> putStrLn "running..."
> waitForTermination
> killThread tid
> shutdownSystem control

Here is our simple AGI application. It



  1. answers the call
  2. waits a second to give the caller time to finish setting up their end of the call
  3. increments the hit counter
  4. plays a pre-recorded file which says, "You are currently caller number"
  5. says the caller number
  6. plays a pre-recorded file which says "Goodbye."
  7. hangs up


The functions answer, streamFile,
sayNumber, and hangUp come from the AGI
library.



The update IncHits call is our database query. Note
that we don't call the incHits function
directly. Instead we call update and pass it the value
IncHits. The IncHits type was created for
us automatically by the call to mkMethods we made
earlier.



> 
> agiMain :: HostName -> PortNumber -> AGI ()
> agiMain hostname portNum =
> do answer
> liftIO $ sleep 1 -- give the caller time to get their end of the call setup
> h <- update IncHits
> streamFile "queue-thereare" [] Nothing
> sayNumber h []
> streamFile "vm-goodbye" [] Nothing
> hangUp Nothing
> return ()

Hooking it up


To test the application, first we need to update the asterisk dialplan to call our AGI application. Something like this should do the trick (be sure to reload the dialplan after modifying extensions.conf):



[default]
exten => 31415,1,AGI(agi://127.0.0.1)

Next we start our AGI application server:



$ runhaskell HitCounter.lhs

And finally, we dial 31415 and hope the magic happens.


Summary


The above code is a good starting template for a more interesting AGI application. Note that caller number is a bit fuzzy. The caller number is determined by who gets to the update function first -- which could be different from who actually connected to the asterisk server first.


Also, when calling a FastAGI application, it is possible to pass in a PATH and query string. The Haskell AGI library makes this information available, but does not provide any special mechanisms for doing something useful with it. This is likely to change in the future.