Jamieson's Tech Blog

Stream of consciousness musings on the state of the art.

Wednesday, June 25, 2008

Dynamic MMAP ran out of room

I always forget this, and wanted to post it here: if you ever get "Dynamic MMAP ran out of room" when apt-get updating, it's probably because you're pinning or getting ready to dist-upgrade. To fix it, just add this line to /etc/apt/apt.conf to increase the apt cache limit to 8MB:

APT::Cache-Limit "8388608";

Thursday, June 5, 2008

Quickie databases (in JSON, of course)

(Note: all of the JSON files below I created/converted by hand using regular expressions, and I'm placing them into the public domain. The larger ones are compressed with gzip, which is a fast but efficient standardized compression system. (It's part of HTTP -- yes, even Internet Explorer has been able to decompress it for years.) If you need a program to decompress it, it's probably already built into your operating system. If your operating system doesn't support it, here's a nickel. I recommend Kubuntu (especially for desktops/laptops or newbies) which is built on Debian (for experts and servers).

Countries:

What I really needed was the ISO country codes, and although this looked useful, it didn't have them. Of course, a stroke of insanity has struck ISO and these are only available for purchase -- in MS Access format!

So, I converted this from raw text to JSON:
countries_by_name.json
countries_by_iso_code.json

U.S. Zip Codes

Matt Cutts hilariously blogs about his experiences digging up ZIP code databases at the Census vs the USPS. (see also http://www.census.gov/geo/www/tiger/zip1999.html)

And, in JSON format (about 2MB each gunzipped):

zips_by_zip.json.gz
zips_by_city_state.json.gz

Sample (Python script):

cjson.decode(open("zips_by_city_state.json").read())["CORTLAND, NY"]
['13045', '36', 76.185675000000003, 42.595174999999998, 29180, 0.001622]
cjson.decode(open("zips_by_zip.json").read())["90210"]
['06', 'CA', 'BEVERLY HILLS', 118.406477, 34.090107000000003, 20700, 0.000696]

IP to Country

This is probably one of the hardest to solve, but there are two free databases that I'm aware of. The last time I reviewed the MaxMind product it had a nasty (nasty!) license clause that essentially said that if you violated the terms of the license, they owned your entire product. Hmm -- what does that mean? I don't know, but I didn't want to find out. Proprietary software == crazy licenses. Ironically, there was a landmark case that lists of facts could not be copyrighted a few years ago.

Haven't reviewed it, but it looks pretty complete and it's released under the GPLv2:
http://software77.net/cgi-bin/ip-country/geo-ip.pl

And, of course, the IP to Country database by webhosting.info:
http://ip-to-country.webhosting.info/

This one makes less sense in JSON format, but here it is as a list of lists. This one is over 4MB uncompressed. See this page for info on how to translate the IP you're testing and scan for the range it fits in.

ip-to-country.json.gz

All of the JSON files listed in this post are released into the public domain.

Area Codes

The definitive resource is probably NANPA, but again with the MS Access. What is with that? Does anyone actually use it? What's wrong with text, CSV, JSON, XML, or even XLS?

area_codes.json

States

Another easy one: http://www.usps.com/ncsc/lookups/usps_abbreviations.html

This is a bit more than the 50 states, because it also includes the military locations of service and
regions and territories served by the U.S. Postal Service:

states.json

Wednesday, June 4, 2008

HTTP Status Codes (Errors) in JSON and XML Format

Here's a handy conversion of the W3C HTTP Status codes from w3.org and descriptions into both XML and JSON format.

Here's a sample in JSON:
"302": {
"name": "Found",
"description": [
"The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests. This response is only cacheable if indicated by a Cache-Control or Expires header field.",
"The temporary URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).",
"If the 302 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.",
"<pre>Note: RFC 1945 and RFC 2068 specify that the client is not allowed to change the method on the redirected request. However, most existing user agent implementations treat 302 as if it were a 303 response, performing a GET on the Location field-value regardless of the original request method. The status codes 303 and 307 have been added for servers that wish to make unambiguously clear which kind of reaction is expected of the client.</pre>"]
},
Note the embedded <pre> tags. Of course, that was just a snippet. In Python, you can just use cjson or simplejson to parse this:
import cjson
http_errors = cjson.decode(open("http_errors.json").read())
print http_errors["302"]
I recommend the version in Javascript Object Notation, since most languages can handle it
easily.

Here's the files:

Labels:

Overselling your disks without LVM

It's easy to "oversell" (i.e., pretend that you have more disk available) your disks by simply using sparse files. This is particularly useful with Linux-Vserver, Xen, UML, VMware, etc. Just use these sparse files (already mounted) as the "backing store" for your virtual machines.

For instance, let's say that you only have a 1TB filesystem in RAID1 available on your server, but you want to provide all of your 100 customers with 100GB of "available" space each, so that they could (but almost certainly won't) each use 100GB. They can't all use all that space at the same time, of course (that's 10 TB, or ten times as much as you have available!), but you can safely keep them all on that server at least for a little while. Of course, you can't actually sell more space than you have, so monitor your available disk space so that when two or three of them start using up all their space, you might want to consider moving those VM's to another server!

Side note: (see this note for older kernels)
Because you are creating so many loopback devices, you'll need to create additional loopback devices. If you are using a newer kernel (2.6.22 or newer):




for ((i=8;i<150;i++));

do
[ -e /dev/loop$i ] || mknod -m 0600 /dev/loop$i b 7 $i
done



In this post, we'll create 100 "sparse" empty hard drives, each appearing to make available 100GB, but in reality only consuming only a few hundred megabytes each! (Note: depending on the filesystem you choose to format your sparse images with, you will need approximately 10-20GB of available disk space in /tmp to complete this exercise, but you can immediately recover the space using the procedure shown below. The process takes about twenty minutes on a SATA hard disk.)

1. Create your 100 "hard drives". Let's use Python to create a list of these hard drives.
cd /tmp # do this example in /tmp; /tmp must have 14GB avail

python

f = open("100.txt", "w")

for i in xrange(100):
f.write(str(i+1) + "\n")

(press control-D to exit)

Now we have a list of 100 numbers that we can use to automatically create our 100 filesystems.
Please note that each of these 100GB filesystems will consume about 136MB of actual space for overhead. I'm using Debian and reiserfs, so please adjust this procedure if necessary. (I recommend reiserfs for "normal" filesystems containing small files, and XFS for filesystems containing smaller numbers of "large" files, such as database filesystems.)

2. Create 100 empty filesystems. Be sure that you have sufficient space (14GB) and time available, or change the numbers accordingly (you might need to install reiserfsprogs):

mkdir vservers sparse_filesystems

for i in `cat 100.txt `
do

echo Building Sparse Filesystem #$i and mounting it..

## First, create the empty file. Note we seek right to the
## end, which creates a sparse file:

dd if=/dev/zero of=sparse_filesystems/fs_$i.img count=1 bs=1 seek=100G

## Create a reiser filesystem on it. Reiser is particularly well-suited for this,
## since it doesn't have a built-in inode limit and handles many small files easily

mkreiserfs -l fs_$i -f -q sparse_filesystems/fs_$i.img


## Create a label for your fstab

sudo su -c "echo '# Temporary Filesystems' >> /etc/fstab"


## Create the directory for the mounted filesystem

mkdir vservers/vserver_$i


## Add a line to /etc/fstab for each filesystem

sudo su -c "echo `pwd`/sparse_filesystems/fs_$i.img `pwd`/vservers/vserver_$i reiserfs loop 0 0" >> /etc/fstab


## mount each loopback filesystem into vservers/vserver_$i

sudo mount vservers/vserver_$i


done


3. You're done! Just check out how much diskspace each filesystem has!

dev01: ~ df -h | less

# the 33M is used by ReiserFS for housekeeping (metadata); actual disk space
# used for housekeeping on my system is around 14GB for all 100 of these
# 100GB filesystems.

/tmp/sparse_filesystems/fs_1.img
100G 33M 100G 1% /tmp/vservers/vserver_1
/tmp/sparse_filesystems/fs_2.img
100G 33M 100G 1% /tmp/vservers/vserver_2
/tmp/sparse_filesystems/fs_3.img
100G 33M 100G 1% /tmp/vservers/vserver_3
/tmp/sparse_filesystems/fs_4.img
100G 33M 100G 1% /tmp/vservers/vserver_4
/tmp/sparse_filesystems/fs_5.img
100G 33M 100G 1% /tmp/vservers/vserver_5
/tmp/sparse_filesystems/fs_6.img
100G 33M 100G 1% /tmp/vservers/vserver_6
/tmp/sparse_filesystems/fs_7.img
100G 33M 100G 1% /tmp/vservers/vserver_7
/tmp/sparse_filesystems/fs_8.img
100G 33M 100G 1% /tmp/vservers/vserver_8
/tmp/sparse_filesystems/fs_9.img
100G 33M 100G 1% /tmp/vservers/vserver_9
/tmp/sparse_filesystems/fs_10.img
100G 33M 100G 1% /tmp/vservers/vserver_10
/tmp/sparse_filesystems/fs_11.img
100G 33M 100G 1% /tmp/vservers/vserver_11
/tmp/sparse_filesystems/fs_12.img
100G 33M 100G 1% /tmp/vservers/vserver_12
/tmp/sparse_filesystems/fs_13.img
100G 33M 100G 1% /tmp/vservers/vserver_13
/tmp/sparse_filesystems/fs_14.img
100G 33M 100G 1% /tmp/vservers/vserver_14
/tmp/sparse_filesystems/fs_15.img
100G 33M 100G 1% /tmp/vservers/vserver_15
/tmp/sparse_filesystems/fs_16.img
100G 33M 100G 1% /tmp/vservers/vserver_16
/tmp/sparse_filesystems/fs_17.img
100G 33M 100G 1% /tmp/vservers/vserver_17
/tmp/sparse_filesystems/fs_18.img
100G 33M 100G 1% /tmp/vservers/vserver_18
/tmp/sparse_filesystems/fs_19.img
100G 33M 100G 1% /tmp/vservers/vserver_19
/tmp/sparse_filesystems/fs_20.img
100G 33M 100G 1% /tmp/vservers/vserver_20
/tmp/sparse_filesystems/fs_21.img
100G 33M 100G 1% /tmp/vservers/vserver_21
/tmp/sparse_filesystems/fs_22.img
100G 33M 100G 1% /tmp/vservers/vserver_22
/tmp/sparse_filesystems/fs_23.img
100G 33M 100G 1% /tmp/vservers/vserver_23
/tmp/sparse_filesystems/fs_24.img
100G 33M 100G 1% /tmp/vservers/vserver_24
/tmp/sparse_filesystems/fs_25.img
100G 33M 100G 1% /tmp/vservers/vserver_25
/tmp/sparse_filesystems/fs_26.img
100G 33M 100G 1% /tmp/vservers/vserver_26
/tmp/sparse_filesystems/fs_27.img
100G 33M 100G 1% /tmp/vservers/vserver_27
/tmp/sparse_filesystems/fs_28.img
100G 33M 100G 1% /tmp/vservers/vserver_28
/tmp/sparse_filesystems/fs_29.img
100G 33M 100G 1% /tmp/vservers/vserver_29
/tmp/sparse_filesystems/fs_30.img
100G 33M 100G 1% /tmp/vservers/vserver_30
/tmp/sparse_filesystems/fs_31.img
100G 33M 100G 1% /tmp/vservers/vserver_31
/tmp/sparse_filesystems/fs_32.img
100G 33M 100G 1% /tmp/vservers/vserver_32
/tmp/sparse_filesystems/fs_33.img
100G 33M 100G 1% /tmp/vservers/vserver_33
/tmp/sparse_filesystems/fs_34.img
100G 33M 100G 1% /tmp/vservers/vserver_34
/tmp/sparse_filesystems/fs_35.img
100G 33M 100G 1% /tmp/vservers/vserver_35
/tmp/sparse_filesystems/fs_36.img
100G 33M 100G 1% /tmp/vservers/vserver_36
/tmp/sparse_filesystems/fs_37.img
100G 33M 100G 1% /tmp/vservers/vserver_37
/tmp/sparse_filesystems/fs_38.img
100G 33M 100G 1% /tmp/vservers/vserver_38
/tmp/sparse_filesystems/fs_39.img
100G 33M 100G 1% /tmp/vservers/vserver_39
/tmp/sparse_filesystems/fs_40.img
100G 33M 100G 1% /tmp/vservers/vserver_40
/tmp/sparse_filesystems/fs_41.img
100G 33M 100G 1% /tmp/vservers/vserver_41
/tmp/sparse_filesystems/fs_42.img
100G 33M 100G 1% /tmp/vservers/vserver_42
/tmp/sparse_filesystems/fs_43.img
100G 33M 100G 1% /tmp/vservers/vserver_43
/tmp/sparse_filesystems/fs_44.img
100G 33M 100G 1% /tmp/vservers/vserver_44
/tmp/sparse_filesystems/fs_45.img
100G 33M 100G 1% /tmp/vservers/vserver_45
/tmp/sparse_filesystems/fs_46.img
100G 33M 100G 1% /tmp/vservers/vserver_46
/tmp/sparse_filesystems/fs_47.img
100G 33M 100G 1% /tmp/vservers/vserver_47
/tmp/sparse_filesystems/fs_48.img
100G 33M 100G 1% /tmp/vservers/vserver_48
/tmp/sparse_filesystems/fs_49.img
100G 33M 100G 1% /tmp/vservers/vserver_49
/tmp/sparse_filesystems/fs_50.img
100G 33M 100G 1% /tmp/vservers/vserver_50
/tmp/sparse_filesystems/fs_51.img
100G 33M 100G 1% /tmp/vservers/vserver_51
/tmp/sparse_filesystems/fs_52.img
100G 33M 100G 1% /tmp/vservers/vserver_52
/tmp/sparse_filesystems/fs_53.img
100G 33M 100G 1% /tmp/vservers/vserver_53
/tmp/sparse_filesystems/fs_54.img
100G 33M 100G 1% /tmp/vservers/vserver_54
/tmp/sparse_filesystems/fs_55.img
100G 33M 100G 1% /tmp/vservers/vserver_55
/tmp/sparse_filesystems/fs_56.img
100G 33M 100G 1% /tmp/vservers/vserver_56
/tmp/sparse_filesystems/fs_57.img
100G 33M 100G 1% /tmp/vservers/vserver_57
/tmp/sparse_filesystems/fs_58.img
100G 33M 100G 1% /tmp/vservers/vserver_58
/tmp/sparse_filesystems/fs_59.img
100G 33M 100G 1% /tmp/vservers/vserver_59
/tmp/sparse_filesystems/fs_60.img
100G 33M 100G 1% /tmp/vservers/vserver_60
/tmp/sparse_filesystems/fs_61.img
100G 33M 100G 1% /tmp/vservers/vserver_61
/tmp/sparse_filesystems/fs_62.img
100G 33M 100G 1% /tmp/vservers/vserver_62
/tmp/sparse_filesystems/fs_63.img
100G 33M 100G 1% /tmp/vservers/vserver_63
/tmp/sparse_filesystems/fs_64.img
100G 33M 100G 1% /tmp/vservers/vserver_64
/tmp/sparse_filesystems/fs_65.img
100G 33M 100G 1% /tmp/vservers/vserver_65
/tmp/sparse_filesystems/fs_66.img
100G 33M 100G 1% /tmp/vservers/vserver_66
/tmp/sparse_filesystems/fs_67.img
100G 33M 100G 1% /tmp/vservers/vserver_67
/tmp/sparse_filesystems/fs_68.img
100G 33M 100G 1% /tmp/vservers/vserver_68
/tmp/sparse_filesystems/fs_69.img
100G 33M 100G 1% /tmp/vservers/vserver_69
/tmp/sparse_filesystems/fs_70.img
100G 33M 100G 1% /tmp/vservers/vserver_70
/tmp/sparse_filesystems/fs_71.img
100G 33M 100G 1% /tmp/vservers/vserver_71
/tmp/sparse_filesystems/fs_72.img
100G 33M 100G 1% /tmp/vservers/vserver_72
/tmp/sparse_filesystems/fs_73.img
100G 33M 100G 1% /tmp/vservers/vserver_73
/tmp/sparse_filesystems/fs_74.img
100G 33M 100G 1% /tmp/vservers/vserver_74
/tmp/sparse_filesystems/fs_75.img
100G 33M 100G 1% /tmp/vservers/vserver_75
/tmp/sparse_filesystems/fs_76.img
100G 33M 100G 1% /tmp/vservers/vserver_76
/tmp/sparse_filesystems/fs_77.img
100G 33M 100G 1% /tmp/vservers/vserver_77
/tmp/sparse_filesystems/fs_78.img
100G 33M 100G 1% /tmp/vservers/vserver_78
/tmp/sparse_filesystems/fs_79.img
100G 33M 100G 1% /tmp/vservers/vserver_79
/tmp/sparse_filesystems/fs_80.img
100G 33M 100G 1% /tmp/vservers/vserver_80
/tmp/sparse_filesystems/fs_81.img
100G 33M 100G 1% /tmp/vservers/vserver_81
/tmp/sparse_filesystems/fs_82.img
100G 33M 100G 1% /tmp/vservers/vserver_82
/tmp/sparse_filesystems/fs_83.img
100G 33M 100G 1% /tmp/vservers/vserver_83
/tmp/sparse_filesystems/fs_84.img
100G 33M 100G 1% /tmp/vservers/vserver_84
/tmp/sparse_filesystems/fs_85.img
100G 33M 100G 1% /tmp/vservers/vserver_85
/tmp/sparse_filesystems/fs_86.img
100G 33M 100G 1% /tmp/vservers/vserver_86
/tmp/sparse_filesystems/fs_87.img
100G 33M 100G 1% /tmp/vservers/vserver_87
/tmp/sparse_filesystems/fs_88.img
100G 33M 100G 1% /tmp/vservers/vserver_88
/tmp/sparse_filesystems/fs_89.img
100G 33M 100G 1% /tmp/vservers/vserver_89
/tmp/sparse_filesystems/fs_90.img
100G 33M 100G 1% /tmp/vservers/vserver_90
/tmp/sparse_filesystems/fs_91.img
100G 33M 100G 1% /tmp/vservers/vserver_91
/tmp/sparse_filesystems/fs_92.img
100G 33M 100G 1% /tmp/vservers/vserver_92
/tmp/sparse_filesystems/fs_93.img
100G 33M 100G 1% /tmp/vservers/vserver_93
/tmp/sparse_filesystems/fs_94.img
100G 33M 100G 1% /tmp/vservers/vserver_94
/tmp/sparse_filesystems/fs_95.img
100G 33M 100G 1% /tmp/vservers/vserver_95
/tmp/sparse_filesystems/fs_96.img
100G 33M 100G 1% /tmp/vservers/vserver_96
/tmp/sparse_filesystems/fs_97.img
100G 33M 100G 1% /tmp/vservers/vserver_97
/tmp/sparse_filesystems/fs_98.img
100G 33M 100G 1% /tmp/vservers/vserver_98
/tmp/sparse_filesystems/fs_99.img
100G 33M 100G 1% /tmp/vservers/vserver_99
/tmp/sparse_filesystems/fs_100.img
100G 33M 100G 1% /tmp/vservers/vserver_100

Ok, neat filesystem tricks; now you'd like to delete these filesystems?

First, unmount them all:

dev01: ~ for i in `cat 100.txt `
do
sudo umount vservers/vserver_$i
done

Second, remove those two temporary directories:

rm -Rf vservers sparse_filesystems


Finally, edit your /etc/fstab to remove the nonexistent mount entries:

vi /etc/fstab

(if you don't already know how to use vi, try nano instead.)

Additional ideas? Use debootstrap to fill all of those filesystems up with cool, crisp Debian Stable! Each bare-bones installation will only consume about 200MB each!

Another way to do this using Device Mapper:
http://jons-thoughts.blogspot.com/2007/11/creating-sparse-block-devices.html

Labels: