------------------------------------------------------------------------
FPS : Fast, packed strings
------------------------------------------------------------------------
This library provides the Data.ByteString library -- strict and lazy
byte arrays manipulable as strings -- providing very time and space
efficient string and IO operations.
For very large data requirements, or constraints on heap size,
Data.ByteString.Lazy is provided, a lazy list of bytestring chunks.
Efficient processing of multi-gigabyte data can be achieved this way.
Requirements:
> Cabal
> GHC 6.4 or greater, or hugs
Building:
> chmod +x Setup.lhs
> ./Setup.lhs configure --prefix=/f/g
> ./Setup.lhs build
> ./Setup.lhs install
After installation, you can run the testsuite as follows:
> cd tests ; make
or
> cd tests ; make hugs
For the full test and benchmark suite, you need GHC and Hugs:
> cd tests ; make everything
Authors:
ByteString was derived from the GHC PackedString library,
originally written by Bryan O'Sullivan, and then by Simon Marlow.
It was adapted, and greatly extended for darcs by David Roundy, and
others. Don Stewart cleaned up and further extended the implementation.
Duncan Coutts wrote much of the .Lazy code.
------------------------------------------------------------------------
Performance, some random numbers (with GHC):
This table compares the performance of common operations ByteString,
from various string libraries.
Size of test data: 21256k, Linux 3.2Ghz P4
FPS7 SPS PS [a]
++ 0.028 ! ! 1.288
length 0.000 0.000 0.000 0.131
pack 0.303 0.502 0.337 -
unpack 3.319* 1.630 7.445 -
compare 0.000 0.000 0.000 0.000
index 0.000 0.000 0.000 0.000
map 2.762* 2.917 4.813 7.286
filter 0.304 2.805 0.954 0.305
take 0.000 0.000 0.024 0.005
drop 0.000 0.000 11.768 0.130
takeWhile 0.000 1.498 0.000 0.000
dropWhile 0.000 1.985 8.447 0.130
span 0.000 9.289 11.144 0.131
break 0.000 9.383 11.268 0.133
lines 0.052 1.114 1.367 2.790
unlines 0.048 ! ! 10.950
words 1.344 2.128 5.644 4.184
unwords 0.016 ! ! 1.305
reverse 0.024 12.997 13.018 1.622
concat 0.000 12.701 11.459 1.163
cons 0.016 2.064 8.358 0.131
empty 0.000 0.000 0.000 0.000
head 0.000 0.000 0.000 0.000
tail 0.000 0.000 14.490 0.130
elem 0.000 1.490 0.001 0.000
last 0.000 - - 0.143
init 0.000 - - 1.147
inits 0.414 - - !
tails 0.460 - - 1.136
intersperse 0.040 - - 10.517
any 0.000 - - 0.000
all 0.000 - - 0.000
sort 0.168 - - !
maximum 0.024 - - 0.183
minimum 0.025 - - 0.185
replicate 0.000 - - 0.053
findIndex 0.096
find 0.120 - - 0.000
elemIndex 0.000 - - 0.000
elemIndicies 0.008 - - 0.314
foldl 0.148
spanEnd 0.000
snoc 0.016
filterChar 0.031
filterNotChar 0.124
join 0.016
split 0.032
findIndices 0.408
splitAt 0.000
lineIndices 0.029
breakOn 0.000
breakSpace 0.000
splitWith 0.329
dropSpace 0.000
dropSpaceEnd 0.000
joinWithChar 0.017
join / 0.016
zip 0.960
zipWith 0.892
isSubstringOf 0.039
isPrefixOf 0.000
isSuffixOf 0.000
count 0.021
Key: FPS6 = FPS 0.6
SPS = Simon Marlow's packedstring prototype
PS = Data.PackedString
[a] = [Char]
- = no function exists
! = stack or memory exhaustion
------------------------------------------------------------------------
== Stress testing really big strings
Doing some stress testing of FPS, here are some results for 0.5G strings.
3.2Ghz box, 2G physical mem.
Size of test data: 524288k
Size of test data: 524288k
Char8 Word8
Effectively O(1) or O(m) where m < n
all 0.000 0.000
any 0.000 0.004
break 0.000 0.000
breakChar 0.000 0.000
breakSpace 0.000
compare 0.000
concat 0.000
drop 0.000
dropSpace 0.000
dropSpaceEnd 0.000
dropWhile 0.000 0.000
elem 0.000 0.000
elemIndex 0.000 0.000
elemIndexLast 0.000 0.000
empty 0.000
head 0.000 0.000
index 0.000 0.000
init 0.000
last 0.000 0.000
length 0.000
notElem 0.000 0.000
span 0.000 0.000
spanChar 0.000 0.000
spanEnd 0.000 0.000
splitAt 0.000
tail 0.000
take 0.000
takeWhile 0.000 0.000
isPrefixOf 0.000
isSuffixOf 0.000
addr1 0.000
addr2 0.000
O(n)
++ 0.676
map 6.080 5.868
cons 0.396 0.396
snoc 0.400 0.400
find 3.240
split 1.204 1.200
lines 2.000
foldl 3.804
unwords 0.552
reverse 0.884
findIndex 3.128
filterChar 0.756 0.732
filter/='f' 8.265 7.012
filterNotChar 4.456 3.388
join 0.400
sort 4.344
maximum 0.776 0.764
minimum 0.772 0.776
replicate 0.008 0.000
elemIndices 0.240 0.240
lineIndices 1.092
joinWithChar 0.400 0.400
isSubstringOf 0.052
count 0.748
slow O(n)
words 38.722
group 77.261
groupBy 96.226
inits 32.430
tails 23.225
findIndices 13.841 15.825
splitWith 18.445 19.225
zip 33.926
zipWith 33.562
|