In this post I’ll discuss in detail an example found in “Real World Haskell” ([1]) and “Thinking Functionally with Haskell” ([2]). The point of this post is to explain in detail some important functional transformations from an inefficient program to an efficient one for computing the mean of a list of numbers. We’ll work only with finite lists. At the end of it, you’ll see that knowing a little more mathematics can help a lot with optimizing programs starting from inefficient algorithms.

Problem. Given a list of float numbers, write a function that computes the mean of these numbers. 

Solution: We can easily specify a function that computes the mean of a list of floats:

mean:: [Float] -> Float
mean [] = 0
mean xs = sum xs / fromIntegral (length xs)

The problem with this definition is that we have to loop twice through the elements of xs in order to compute the mean (because xs is needed in computing the sum and the length). Also, because xs is needed for sum and length, it will be retained in memory. This is a major space leak if we think of lists with millions of elements.

To avoid looping twice through the elements of xs, we need to compute sum and length in the same time. We’ll use the simple but powerful technique of tupling and define a function called sumlen:

sumlen :: [Float] -> (Float,Int)
sumlen xs = (sum xs, length xs)

The real challenge begins here: how can we define sumlen as a function that loops only once through xs?

Somehow we need to define a recursive definition of sumlen. Let’s define our plan to do that (explaining in detail every step from [2]):

  1. Find a recursive definition for sumlen that loops through xs only once.
  2. Find a definition of sumlen using foldr.
  3. Make explicit the transition from foldr to foldl and define sumlen as a foldl.
  4. We can use foldl’ instead of foldl and, in this way, hope for an algorithm that works in constant space.

1. Finding a recursive definition for sumlen that loops through xs only once.

First of all, I define sumcomp to be the sum by components of two pairs of numbers like this:

And now let’s do a little computation in order to transform sumlen into a recursive function:

I write  [x1,x2,…,xn] = x1 : [x2,…,xn] and I define xs = [x2,…,xn]. With this in mind, we proceed like this:

 

So (rewritting a little to be more conventional: x=x1): sumlen(x:xs) = (x+s,1+n), where (s,n) = sumlen xs. From the previous inefficient specification of sumlen, it’s clear that sumlen [] = (0,0).

We achieved our goal of computing the sum and length of a list of numbers with only one looping through the elements:

sumlen :: [Float] -> (Float,Int)
sumlen []= (0,0)
sumlen(x:xs) = (x+s,1+n)
        where (s,n) = sumlen xs;

Remark: We found that specifying an inefficient algorithm first is not a bad thing. We can use then symbol manipulations to compute more efficient definitions starting with the inefficient definition. This is a very powerful technique in functional programming and you can see more elaborate examples in Richard Bird’s book “Pearls of Functional Algorithm Design” ([3]).

2. Finding a definition of sumlen using foldr.

For this, we need more mathematical machinery, because it simplifies things a lot. I’ll use the universality theorem described by Graham Hutton in [4]:

Theorem:

    g = foldr f v
      <=>
    g [] = v
    g (x : xs) = f x (g xs)

This is a very powerful theorem stating that we can write a function g as a foldr if and only if the function g respects two equations (one for the empty list and one for the general list (x:xs)).

In my case, the function g = sumlen. So I want a definition like sumlen = foldr f v. Our task is to find the actual f and v for our instance.

The first equation of the theorem says that sumlen [] = v. But we found that sumlen [] = (0,0), so v = (0,0).

The second equation of the theorem says that: sumlen (x:xs) = f x (sumlen xs). But sumlen (x:xs) = (x+s, 1+n), where (s,n) = sumlen xs. So we have that:

f x (sumlen xs) = (s+x, n+1), where (s,n) = sumlen xs. If I raise (sumlen xs) to a general variable y, we’ve found our function f:

f x y = (s+x, n+1), where y = (s,n); or, replacing y with (s,n), we have f x (s,n) = (s+x,n+1).

In this way, we’re sure the right part of the universality theorem holds and, applying the theorem, we conclude that:

sumlen :: [Float] -> (Float,Int)
sumlen = foldr f (0,0)
    where f x (s,n) = (s+x,n+1);

Remark: As you can see, the universality theorem of foldr is a very powerful technique of transforming a function into a foldr. It can be used in many more contexts and you can find more about it in the article of Graham Hutton ([4]).

3. Making explicit the transition from foldr to foldl and define sumlen as a foldl.

In order to transform the foldr definition of sumlen into a foldl, we’ll need again more mathematical machinery that simplifies the process. We’ll use a theorem proved in [2], stating that under certain conditions, a foldr can be transformed into a foldl:

Theorem:

For all finite lists xs, we have:

foldr f e xs =foldl g e xs

provided that the following equations hold:

f x (g y z) = g (f x y) z  (eq. 1)
f x e = g e x (eq. 2)

For our example, we want to find a function g such that: foldr f (0,0) xs = foldl g (0,0) xs, where f x (s,n) = (s+x,n+1).

It’s clear from the given data that e = (0,0).

I’ll use the second equation (eq. 2), because is the simpler one and can give me some informations about g in the base case (0,0):

f x (0,0) =g (0,0) x

f x (0,0)

={definition of f}

(x,1)

So we have that g (0,0) x = (x,1), for all x of appropriate type. (R1)

But we also must preserve the equation f x (g y z) = g (f x y) z, for all x,y,z of appropriate types.

Trying to use R1, I’ll instantiate y = (0,0) in the equation eq.2 and obtain:

f x (g (0,0) z) = g (f x (0,0)) z = {definition of f} g (x,1) z.

So we have: f x (g (0,0) z) = g (x,1) z (i)

But f x (g (0,0) z) = {using R1} f x (z,1). (ii)

From (i) and (ii) we conclude that f x (z,1) = g (x,1) z. (iii)

But f x (z,1) = {definition of f} (x+z,2) (iv)

From (iii) and (iv) we have that g (x,1) z = (x+z,2). Straightforward generalizations gives us:

g (x,n) z = (x+z,n+1).

Exercise: Verify the found function g satisfies eq.1 and eq.2 of the theorem.

Because g satisfies eq.1 and eq.2, the theorem says that:

foldr f (0,0) xs = foldl g (0,0) xs, with g (x,n) z = (x+z,n+1).

So we arrived at sumlen = foldr f (0,0) = foldl g (0,0), with g (x,n) z = (x+z,n+1), which was our task.

4. Because we’re working with strict functions, we can use foldl’ instead of foldl and we have:

sumlen :: [Float] -> (Float,Int)
sumlen = foldl’ g (0,0)
    where g (x,n) z = (x+z,n+1);

To make it work in constant space, follow the explanations from [1] (Chapter 25) and rewrite sumlen like this:

This function can be much more optimized, but you can read about that in [1], chapter 25.

As an important conclusion, knowing mathematics can help a lot in optimizations and that is part of the beauty of functional programming!

You can ask anything and, if you want, you can subscribe to my blog.

You can also take a look on my page on Patreon (click this text).

—————————————————————————————————————————————————————

[1]. Bryan O’Sullivan, John Goerzen, Donald Bruce Stewart (2008) Real World Haskell. O’Reilly Media.
[2]. Bird, Richard. (2015) Thinking Functionally with Haskell. Cambridge University Press.
[3]. Bird, Richard. (2010) Pearls of Functional Algorithm Design.  Cambridge University Press.
[4] Hutton, Graham (1999). A tutorial on the universality and expressiveness of fold. Journal of Functional Programming. Cambridge University Press.

Constantin Liculescu

Facebook 

Share Button