Planet ALUG

Syndicate content
Planet ALUG - http://planet.alug.org.uk/
Updated: 1 hour 42 min ago

MJ Ray: Rebooting democracy? The case for a citizens constitutional convention.

Fri, 20/02/2015 - 04:03

I’m getting increasingly cynical about our largest organisations and their voting-centred approach to democracy. You vote once, for people rather than programmes, then you’re meant to leave them to it for up to three years until they stand for reelection and in most systems, their actions aren’t compared with what they said they’d do in any way.

I have this concern about Cooperatives UK too, but then its CEO publishes http://www.uk.coop/blog/ed-mayo/2015-02-18/rebooting-democracy-case-citizens-constitutional-convention and I think there may be hope for it yet. Well worth a read if you want to organise better groups.

Categories: LUG Community Blogs

Chris Lamb: Calculating the ETA to zero in shell

Fri, 30/01/2015 - 20:49
< Faux> I have a command which emits a number. This number is heading towards zero. I want to know when it will arrive at zero, and how close to zero it has got.

Damn right you can.

eta2zero () { A=$(eval ${@}) while [ ${A} -gt 0 ] do B=$(eval ${@}) printf %$((${A} - ${B}))s A=${B} sleep 1 done | pv -s ${A} >/dev/null }

In action:

$ rm -rf /big/path & [1] 4895 $ eta2zero find /big/path \| wc -l 10 B 0:00:14 [ 0 B/s] [================================> ] 90% ETA 0:00:10

(Sincere apologies for the lack of strace...)

Categories: LUG Community Blogs

Daniel Silverstone (Kinnison): Caius -- A heirarchical delegable password safe

Thu, 29/01/2015 - 13:08

A long while ago I, Rob Kendrick, Clive Jones (and possibly others) sat down and tried to come up with a way to store passwords a-la Password Safe. However, being us, we wanted to ensure a number of properties which password safes commonly don't have. We wanted to allow the delegation of access to some subset of the passwords. We also wanted for it to be reasonable to deny that there is content which has not been decrypted.

I was reminded of this work when I was discussing the concept of deniable storage of secrets with a colleague (An idea I'll expand upon in another blog post at another time). I am therefore presenting, with little change other than formatting the design from years ago. I would be very interested if anyone knows of software which meets the properties of the Caius system since I would like to have one but simply don't trust myself (see another future posting) to write it right now.

Caius

The following concepts are assumed to be understood:

The 'Caius' system is a password-safe type system sporting hierarchical delegable access to the data it stores. The 'Caius Fob' is the data-store for the system.

The 'Caius Fob' is a file which consists of a header and then three sections. The header identifies it as such a file, the first section lists a number of 'External IDs' which can be used to access portions of the file. The second section lists ACL entries as defined below. The third section of the file is the encrypted data looked after by this file. It is not intended that the holder of a CaiusFob be able to deny it is a CaiusFob, but it is expected that it be possible to deny an ability to decrypt (perhaps by lacking a password) any ACL entries. Given that the structure of the file is known, it is necessary that there be external IDs for which the password or GPG key is not valid or cannot decrypt an ACL entry, and ACL entries which even if decrypted may not be valid, and ACL entries which even if decrypted and valid may not be used to encode any data blocks.

An External ID External ID ::= LENGTH TYPE DATA

Where TYPE is one of: * 0: Unused (ID slot placeholder) * 1: GPG key, where DATA is the keyid of the gpg key * 2: Password, where DATA is some hash of a password which can be used to derive a key for decrypting an ACL entry.

The list of external ids forms a numbered sequence where the index (0-based) into the sequence is the External ID number. (EIDnr)

An ACL Entry ACL Entry ::= LENGTH EIDnr DATA HMAC

The EIDnr is the number of the External ID as explained above. The LENGTH is the length of the DATA section which is a key-pair as explained below, encrypted to the external id. The HMAC uses the authentication key in the key-pair in the DATA section, and authenticates the EIDnr, LENGTH, DATA tuple.

One possibility for increasing deniability is to remove the EIDnr from this part of the file, and assume that for every external ID you try to decrypt all ACLs you've not succeeded in decrypting thus-far. This has the benefit of being able to deny that an ACL entry ought to be decryptable with the credentials you hold, but also an increased inability to know if you have successfully unlocked everything appropriate to being able to fully manipulate a CaiusFob. This tradeoff is currently set in favour of better understanding of the content, but a future design feature might suggest EIDnr should always be -1 to indicate "unknown, try every EID".

A key pair Key Pair ::= ENCRYPTIONKEY AUTHENTICATIONKEY

The ENCRYPTIONKEY is used to initialise the stream cipher for the data section. The AUTHENTICATIONKEY is used to compute HMACs for the appropriate ACL entries or data blocks (as defined below)

The data section

First consider a set of stream ciphers. There exists always one cipher which we will call the NULL cipher. It is defined such that Cipher(Byte, Offset) == Byte and is always available. Then there is a cipher initialised for each key pair we can successfully extract from the ACL entry section of the file.

Each of these ciphers is initialised and made ready such that the data section can be xored with the bytes as they come out of the stream ciphers in an attempt to decrypt the blocks. Essentially this is equivalent to decrypting the entire data section with each cipher in turn to produce N proposed cleartexts which can then be examined to find decrypted blocks.

Whenever a cipher, combined with the data stream in the file, manages to produce a sequence of eight bytes of value zero, we have reached a synchronisation point and what follows is a data block enciphered with which ever cipher managed to reveal the synchronisation sequence.

Since it is vanishingly unlikely that you will find eight zeros in a row when playing about with arbitrary cipher initialisation, we consider this to be an acceptable synchronisation sequence.

Once we have found a sync sequence, we can know the length of this block and thus we do not look for sync markers again until after the block we have just found.

A data block Data block ::= DATALENGTH PADLENGTH TYPE DATA PAD HMAC

Such that each field is the obvious, DATA is DATALENGTH bytes of data, the texture of which is defined by TYPE and PAD is PADLENGTH arbitrary bytes which pad this block. HMAC is keyed using the authentication key associated with the stream cipher managing to decrypt this and is over the DATALENGTH, PADLENGTH, TYPE, DATA, PAD tuple.

If TYPE is zero then this is a ''free-space'' block and will typically contain zero bytes of DATA and some number of padding. This is however arbitrary and not enforced, the free space can be DATA if the implementer prefers and implementations are encouraged if possible to randomise the distribution of the consumed space between the DATA and PAD sections.

A node block TYPE == 1 (Node) DATA ::= MY_ID PARENT_ID NAME NOTES

MY_ID is a unique ID for this node. (generally a random number, probably 64 bits long, perhaps a UUID). PARENT_ID if not all NULLs is the ID for the parent node. If all NULLs then this is the root of a heirarchy. NAME is a NULL terminated byte string of one or more characters which is the name of this node. It may consist only of any characters other than NULL and the forward-slash character. NOTES is a byte string of zero or more characters, NULL terminated. Note that the DATALENGTH of the data block clearly delimits this field but the NULL is present to aid parsing.

A system block TYPE == 2 (System) DATA ::= PARENT_ID USERNAME PASSWORD EXPIRYDATE NOTES

PARENT_ID is the node to which this block belongs. It is required that any system blocks you succeed in decrypting can be placed within a node you succeed in decrypting. If the library encounters a system block which belongs to a node it cannot find then this is considered to be a corrupt system block and will be treated as though it could not be decrypted.

USERNAME is a byte string of one or more characters terminated by a NULL, ditto for PASSWORD and as for a node block, the NOTES are NULL terminated also.

EXPIRYDATE is a byte string of characters in RFC-822 date format.

Implementation notes

It is expected that any implementation of Caius will adhere to the following guidelines in order to enhance the security of content over time.

  1. Any time a block is invalidated (such as by the changing of a password, the obsoleting of an entry, the changing of notes, names, or reparenting a node) anywhere from one to all of the bits in the block can be changed. Trivially, this includes the synchronisation sequence preceeding the block as if the synchronisation sequence isn't present then the block does not exist.

  2. Every time a CaiusFob is altered in any way, some amount of known intra-block padding must be altered in a random way. Ideally this will be done so that it looks like number 1 has happened somewhere else in the file as well. Anywhere from zero to all of the padding can be thusly altered in any given change.

  3. No attempt will be made to write to any part of the file which cannot be positively identified as padding unless the user has explicitly stated that they will accept damage to data they cannot currently decrypt.

  4. No indication will be given to the user if any part of the file was unable to be decrypted or failed an HMAC check. Such data is simply incorrectly decrypted and thus ignored.

  5. Intrablock padding can be positively identified if you have two consecutive blocks in a CaiusFob such that the number of bytes between them could not possibly hold the simplest of free space blocks.

  6. When appending a block to a CaiusFob it is encouraged to place up to 50% of the size of the intrablock spacing before it as random padding, and up to 50% afterwards also. Naturally anywhere between zero and the full amount is acceptable, ideally the implementation would choose at random each time.

Categories: LUG Community Blogs

Daniel Silverstone (Kinnison): I promise that I...

Tue, 27/01/2015 - 22:27

A friend and ex-colleague Francis Irving (@frabcus on Twitter) has recently been on a bit of an anti C/C++ kick, including tweeting about the problems which happen in software written in so called "insecure" languages, and culminating in his promise website which boldly calls for people to promise to not use C/C++ for new projects.

Recently I've not been programming enough. I'm still a member of the NetSurf browser project, and I'm still (slowly) working on Gitano from time to time. I am still (in theory) an upstream on the Cherokee Webserver project (and I really do need to sit down and fix some bugs in logging) and any number of smaller projects as well. I am still part of Debian and would like to start making positive contributions beyond voting and egging others on, but I have been somewhat burned out by everything going on in my life, including both home and work. While I am hardly in any kind of mental trouble, I've simply not had any tuits of late.

I find it very hard to make public promises which I know I am going to break. Francis suggested that the promise can be broken which while it might not devalue it for him (or you) it does for me. I do however think that public promises are a good thing, especially when they foster useful discussion in the communities I am part of, so from that point of view I very much support Francis in his efforts.

Even given all of the above, I'd like to make a promise statement of my own. I'd like to make it in public and hopefully that'll help me to keep it. I know I could easily fail to live up to this promise, but I'm hoping I'll do well and to some extent I'm relying on all you out there to help me keep it. Given we're almost at the end of the month, I am making the promise now and I want it to take effect starting on the 1st of February 2015.

I hereby promise that I will do better at contributing to all the projects I am nominally part of, making at least one useful material contribution to at least one project every week. I also promise to be more mindful of the choices I make when choosing how to implement solutions to problems, selecting appropriate languages and giving full consideration to how my solution might be attacked if appropriate.

I can't and won't promise not to use C/C++ but if you honestly feel you can make that promise, then I'm certain Francis would love for you to head over to his promise website and pledge. Also regardless of your opinions, please do join in the conversation, particularly regarding being mindful of attack vectors whenever you write something.

Categories: LUG Community Blogs

Chris Lamb: Recent Redis hacking

Sun, 25/01/2015 - 20:52

I've done a bunch of hacking on the Redis key/value database server recently:

  • Lua-based maxmemory eviction scripts. (#2319)

    (This changeset was sponsored by an anonymous client.)

    Redis typically stores the entire data set in memory, using the operating system's virtual memory facilities if required. However, one can use Redis more like a cache or ring buffer by enabling a "maxmemory policy" where a RAM limit is set and then data is evicted when required based on a predefined algorithm.

    This change enables entirely custom control over exactly what data to remove from RAM when this maxmemory limit is reached. This is an advantage over the existing policies of, say, removing entire keys based on the existing TTL, Least Recently Used (LRU) or random eviction strategies as it permits bespoke behaviours based on application-specific requirements, crucially without maintaining a private fork of Redis.

    As an example behaviour of what is possible with this change, to remove the lowest ranked member of an arbitrary sorted set, you could load the following eviction policy script:

    local bestkey = nil local bestval = 0 for s = 1, 5 do local key = redis.call("RANDOMKEY") local type_ = redis.call("TYPE", key) if type_.ok == "zset" then local tail = redis.call("ZRANGE", key, "0", "0", "WITHSCORES") local val = tonumber(tail[2]) if not bestkey or val < bestval then bestkey = key bestval = val end end end if not bestkey then -- We couldn't find anything to remove, so return an error return false end redis.call("ZREMRANGEBYRANK", bestkey, "0", "0") return true
  • TCP_FASTOPEN support. (#2307)

    The aim of TCP_FASTOPEN is to eliminate one roundtrip from a TCP conversation by allowing data to be included as part of the SYN segment that initiates the connection. (More info.)

  • Support infinitely repeating commands in redis-cli. (#2297)

  • Add --failfast option to testsuite runner. (#2290)

  • Add a -q (quiet) argument to redis-cli. (#2305)

  • Making some Redis Sentinel defaults a little saner. (#2292)

I also made the following changes to the Debian packaging:

  • Add run-parts(8) directories to be executed at various points in the daemon's lifecycle. (e427f8)

    This is especially useful for loading Lua scripts as they are not persisted across restarts.

  • Split out Redis Sentinel into its own package. (#775414, 39f642)

    This makes it possible to run Sentinel sanely on Debian systems without bespoke scripts, etc.

  • Ensure /etc/init.d/redis-server start idempotency with --oknodo (60b7dd)

    Idempotency in initscripts is especially important given the rise of configuration managment systems.

  • Uploaded 3.0.0 RC2 to Debian experimental. (37ac55)

  • Re-enabled the testsuite. (7b9ed1)

Categories: LUG Community Blogs

Chris Lamb: Slack integration for Django

Fri, 23/01/2015 - 22:46

I recently started using the Slack group chat tool in a few teams. Wishing to add some vanity notifications such as sales and user growth milestones from some Django-based projects, I put together an easy-to-use integration between the two called django-slack.

Whilst you can use any generic Python-based method of sending messages to Slack, using a Django-specific integration has some advantages:

  • It can use the Django templating system, rather than constructing messages "by hand" in views.py and models.py which violates abstraction layers and often requires unwieldy and ugly string manipulation routines that would be trivial inside a regular template.
  • It can easily enabled and disabled in certain environments, preventing DRY violations by centralising logic to avoid sending messages in development, staging environments, etc.
  • It can use other Django idioms such as a pluggable backend system for greater control over exactly how messages are transmitted to the Slack API (eg. sent asynchronously using your queuing system, avoiding slowing down clients).

Here is an example of how to send a message from a Django view:

from django_slack import slack_message @login_required def view(request, item_id): item = get_object_or_404(Item, pk=item_id) slack_message('items/viewed.slack', { 'item': item, 'user': request.user, }) return render(request, 'items/view.html', { 'item': item, })

Where items/viewed.slack (in your templates directory) might contain:

{% extends django_slack %} {% block text %} {{ user.get_full_name }} just viewed {{ item.title }} ({{ item.content|urlize }}). {% endblock %}

.slack files are regular Django templates — text is automatically escaped as appropriate and that you can use the regular template filters and tags such as urlize, loops, etc.

By default, django-slack posts to the #general channel, but it can be overridden on a per-message basis by specifying a channel block:

{% block channel %} #mychannel {% endblock %}

You can also set the icon, URL and emoji in a similar fashion. You can set global defaults for all of these attributes to avoid DRY violations within .slack templates as well.

For more information please see the project homepage or read the documentation. Patches and other contributions are welcome via the django-slack GitHub project.

Categories: LUG Community Blogs

MJ Ray: Outsourcing email to Google means SPF allows phishing?

Thu, 22/01/2015 - 03:57

I expect this is obvious to many people but bahumbug To Phish, or Not to Phish? just woke me up to the fact that if Google hosts your company email then its Sender Policy Framework might make other Google-sent emails look legitimate for your domain. When combined with the unsupportive support of the big free webmail hosts, is this another black mark against SPF?

Categories: LUG Community Blogs

Chris Lamb: Sprezzatura

Wed, 21/01/2015 - 10:31

Wolf Hall on Twitter et al:

He says, "Majesty, we were talking of Castiglione's book. You have found time to read it?"

"Indeed. He extrolls sprezzatura. The art of doing everything gracefully and well, without the appearance of effort. A quality princes should cultivate."

"Yes. But besides sprezzatura one must exhibit at all times a dignified public restraint..."

Categories: LUG Community Blogs

Jonathan McDowell: Moving to Jekyll

Wed, 21/01/2015 - 10:00

I’ve been meaning to move away from Movable Type for a while; they no longer provide the “Open Source” variant, I’ve had some issues with the commenting side of things (more the fault of spammers than Movable Type itself) and there are a few minor niggles that I wanted to resolve. Nothing has been particularly pressing me to move and I haven’t been blogging as much so while I’ve been keeping an eye open for a replacement I haven’t exerted a lot of energy into the process. I have a little bit of time at present so I asked around on IRC for suggestions. One was ikiwiki, which I use as part of helping maintain the SPI website (and think is fantastic for that), the other was Jekyll. Both are available as part of Debian Jessie.

Jekyll looked a bit fancier out of the box (I’m no web designer so pre-canned themes help me a lot), so I decided to spend some time investigating it a bit more. I’d found a Movable Type to ikiwiki converter which provided a starting point for exporting from the SQLite3 DB I was using for MT. Most of my posts are in markdown, the rest (mostly from my Blosxom days) are plain HTML, so there wasn’t any need to do any conversion on the actual content. A minor amount of poking convinced Jekyll to use the same URL format (permalink: /:year/:month/:title.html in the _config.yml did what I wanted) and I had to do a few bits of fix up for some images that had been uploaded into MT, but overall fairly simple stuff.

Next I had to think about comments. My initial thought was to just ignore them for the moment; they weren’t really working on the MT install that well so it’s not a huge loss. I then decided I should at least see what the options were. Google+ has the ability to embed in your site, so I had a play with that. It worked well enough but I didn’t really want to force commenters into the Google ecosystem. Next up was Disqus, which I’ve seen used in various places. It seems to allow logins via various 3rd parties, can cope with threading and deals with the despamming. It was easy enough to integrate to play with, and while I was doing so I discovered that it could cope with importing comments. So I tweaked my conversion script to generate a WXR based file of the comments. This then imported easily into Disqus (and also I double checked that the export system worked).

I’m sure the use of a third party to handle comments will put some people off, but given the ability to export I’m confident if I really feel like dealing with despamming comments again at some point I can switch to something locally hosted. I do wish it didn’t require Javascript, but again it’s a trade off I’m willing to make at present.

Anyway. Thanks to Tollef for the pointer (and others who made various suggestions). Hopefully I haven’t broken (or produced a slew of “new” posts for) any of the feed readers pointed at my site (but you should update to use feed.xml rather than any of the others - I may remove them in the future once I see usage has died down).

(On the off chance it’s useful to someone else the conversion script I ended up with is available. There’s a built in Jekyll importer that may be a better move, but I liked ending up with a git repository containing a commit for each post.)

Categories: LUG Community Blogs

Chris Lamb: Adjusting a backing track with SoX

Sun, 18/01/2015 - 12:28

Earlier today I came across some classical sheet music that included a "playalong" CD, just like a regular recording except it omits the solo cello part. After a quick listen it became clear there were two problems:

  • The recording was made at A=442, rather than the more standard A=440.
  • The tempi of the movements was not to my taste, either too fast or too slow.

SoX, the "Swiss Army knife of sound processing programs", can easily adjust the latter, but to remedy the former it must be provided with a dimensionless "cent" unit—ie. 1/100th of a semitone—rather than the 442Hz and 440Hz reference frequencies.

First, we calculate the cent difference with:

Next, we rip the material from the CD:

$ sudo apt-get install ripit flac [..] $ ripit --coder 2 --eject --nointeraction [..]

And finally we adjust the tempo and pitch:

$ apt-get install sox libsox-fmt-mp3 [..] $ sox 01.flac 01.mp3 pitch -7.85 tempo 1.00 # (Tuning notes) $ sox 02.flac 02.mp3 pitch -7.85 tempo 0.95 # Too fast! $ sox 03.flac 03.mp3 pitch -7.85 tempo 1.01 # Close.. $ sox 04.flac 04.mp3 pitch -7.85 tempo 1.03 # Too slow!

(I'm converting to MP3 at the same time it'll be more convenient on my phone.)

Categories: LUG Community Blogs

Jonathan McDowell: Tracking a ship around the world

Tue, 13/01/2015 - 16:07

I moved back from the California Bay Area to Belfast a while back and for various reasons it looks like I’m going to be here a while, so it made sense to have my belongings shipped over here. They haven’t quite arrived yet, and I’ll do another post about that process once they have, but I’ve been doing various tweets prefixed with “[shipping]” during the process. Various people I’ve spoken to (some who should know me better) thought this was happening manually. It wasn’t. If you care about how it was done, read on.

I’d been given details of the ship carrying my container, and searching for that turned up the excellent MarineTraffic which let me see the current location of the ship. Turns out ships broadcast their location using AIS and anyone with a receiver can see the info. Very cool, and I spent some time having a look at various bits of shipping around the UK out of interest. I also found the ship’s itinerary which give me some idea of where it would be calling and when. Step one was to start recording this data; it was time sensitive and I wanted to be able to see historical data. I took the easy route and set up a cron job to poll the location and itinerary on an hourly basis, and store the results. That meant I had the data over time, if my parsing turned out to miss something I could easily correct it, and that I wasn’t hammering Marine Traffic while writing the parsing code.

Next I wanted to parse the results, store them in a more manageable format than the HTML, and alert me when the ship docked somewhere or set off again. I’ve been trying to learn more Python rather than doing my default action of turning to Perl for these things, and this seemed like a simple enough project to try out. Beautiful Soup seemed to turn up top for HTML parsing in Python, so that formed the basis. Throwing the info into a database so I could do queries felt like the right move so I used SQLite - if this had been more than a one off I’d have considered looking at PostgreSQL and its GIS support. Finally Tweepy made it very easy to tweet from Python in about 4 lines of code. The whole thing weighed in at only 175 lines of code, mostly around pulling the info out of the HTML and then a little to deal with checking for state changes against the current status and the last piece of info in the database.

The pieces of information I chose to store were the time of the update (i.e. when the ship sent it, not when my script ran), reported area, reported state, the position + course, reported origin, reported destination and eta. The fact this is all in a database makes it very easy to do a few queries on the data.

How fast did the ship go?

sqlite> SELECT MAX(speed) FROM status; MAX(speed) 21.9

What areas did it report?

sqlite> SELECT area FROM status GROUP BY area; area - Atlantic North California Caribbean Sea Celtic Sea English Channel Hudson River Pacific North Panama Canal

What statuses did we see?

sqlite> SELECT status FROM status GROUP BY status; status At Anchor Moored Stopped Underway Underway using Engine

Finally having hourly data lets me draw a map of where the ship went. The data isn’t complete, because the free AIS info depends on the ship being close enough to a receiving station. That means there were a few days in the North Atlantic without updates, for example. However there’s enough to give a good idea of just how well traveled my belongings are, and it gave me an excuse to play with OpenLayers.

(Apologies if the zoom buttons aren’t working for you here; I think I need to kick the CSS in some manner I haven’t quite figured out yet.)

Categories: LUG Community Blogs

Chris Lamb: Giant's Causeway puzzle

Mon, 12/01/2015 - 10:34

Over Christmas I found the above puzzle at my parent's house. I don't know if it has a name but it seemed apt to name it after The Giant's Causeway.

The idea is that you swap and/or rotate the outer tiles until the colours at the edges match. The center tile is fixed, or at least I assume it is as the wooden edges are deliberately less rounded.

Of course, solving it manually would be boring so here's a dumb brute force solution. I actually admire how inefficient and stupid it is...

import itertools class Tile(list): def __getitem__(self, x): return super(Tile, self).__getitem__(x % len(self)) def rotate(self, x): return Tile(self[x:] + self[:x]) COLOURS = ('YELLOW', 'WHITE', 'BLACK', 'RED', 'GREEN', 'BLUE') YELLOW, WHITE, BLACK, RED, GREEN, BLUE = range(len(COLOURS)) TILES = ( Tile((WHITE, YELLOW, RED, BLUE, BLACK, GREEN)), Tile((RED, BLUE, BLACK, YELLOW, GREEN, WHITE)), Tile((WHITE, BLACK, YELLOW, GREEN, BLUE, RED)), Tile((WHITE, BLUE, GREEN, YELLOW, BLACK, RED)), Tile((GREEN, BLUE, BLACK, YELLOW, RED, WHITE)), Tile((RED, YELLOW, GREEN, BLACK, BLUE, WHITE)), ) CENTER = Tile((WHITE, BLACK, RED, GREEN, BLUE, YELLOW)) def pairwise(it): a, b = itertools.tee(it) next(b, None) return itertools.izip(a, b) def validate(xs): for idx, x in enumerate(xs): if x[idx + 3] != CENTER[idx]: raise ValueError("Tile does not match center") for idx, (a, b) in enumerate(pairwise(xs)): if a[idx + 2] != b[idx + 5]: raise ValueError("Tile does not match previous anticlockwise tile") return xs def find_solution(): # For all possible tile permutations.. for xs in itertools.permutations(TILES): # ... try all possible rotations for ys in itertools.product(range(len(COLOURS)), repeat=len(COLOURS)): try: return validate([x.rotate(y) for x, y in zip(xs, ys)]) except ValueError: pass raise ValueError("Could not find a solution.") for x in find_solution(): print ', '.join(COLOURS[y] for y in x)

This prints (after 5 minutes!):

$ python giants.py GREEN, BLUE, RED, WHITE, BLACK, YELLOW WHITE, BLUE, GREEN, YELLOW, BLACK, RED YELLOW, GREEN, BLACK, BLUE, WHITE, RED GREEN, WHITE, YELLOW, RED, BLUE, BLACK RED, BLUE, BLACK, YELLOW, GREEN, WHITE BLUE, BLACK, YELLOW, RED, WHITE, GREEN python giants.py 269.08s user 0.02s system 99% cpu 4:29.36 total

UPDATE: Seems like this puzzle is called "Circus Seven". Now to dust off my Prolog...

Categories: LUG Community Blogs

Chris Lamb: Uploading a large number of files to Amazon S3

Fri, 09/01/2015 - 13:31

I recently had to upload a large number (~1 million) of files to Amazon S3.

My first attempts revolved around s3cmd (and subsequently s4cmd) but both projects seem to based around analysing all the files first, rather than blindly uploading them. This not only requires a large amount of memory, non-trivial experimentation, fiddling and patching is also needed to avoid unnecessary stat(2) calls. I even tried a simple find | xargs -P 5 s3cmd put [..] but I just didn't trust the error handling correctly.

I finally alighted on s3-parallel-put, which worked out well. Here's a brief rundown on how to use it:

  1. First, change to your source directory. This is to ensure that the filenames created in your S3 bucket are not prefixed with the directory structure of your local filesystem — whilst s3-parallel-put has a --prefix option, it is ignored if you pass a fully-qualified source, ie. one starting with a /.
  2. Run with --dry-run --limit=1 and check that the resulting filenames will be correct after all:
$ export AWS_ACCESS_KEY_ID=FIXME $ export AWS_SECRET_ACCESS_KEY=FIXME $ /path/to/bin/s3-parallel-put \ --bucket=my-bucket \ --host=s3.amazonaws.com \ --put=stupid \ --insecure \ --dry-run --limit=1 \ . [..] INFO:s3-parallel-put[putter-21714]:./yadt/profile.Profile/image/circle/807.jpeg -> yadt/profile.Profile/image/circle/807.jpeg [..]
  1. Remove --dry-run --limit=1, and let it roll.
Categories: LUG Community Blogs

Chris Lamb: Uploading a large number of files to S3

Fri, 09/01/2015 - 13:31

I recently had to upload a large number (~1 million) of files to Amazon S3.

My first attempts revolved around s3cmd (and subsequently s4cmd) but both projects seem to based around analysing all the files first, rather than blindly uploading them. This not only requires a large amount of memory, non-trivial experimentation, fiddling and patching is also needed to avoid unnecessary stat(2) calls. I even tried a simple find | xargs -P 5 s3cmd put [..] but I just didn't trust the error handling correctly.

I finally alighted on s3-parallel-put, which worked out well. Here's a brief rundown on how to use it:

  1. First, change to your source directory. This is to ensure that the filenames created in your S3 bucket are not prefixed with the directory structure of your local filesystem — whilst s3-parallel-put has a --prefix option, it is ignored if you pass a fully-qualified source, ie. one starting with a /.
  2. Run with --dry-run --limit=1 and check that the resulting filenames will be correct after all:
$ export AWS_ACCESS_KEY_ID=FIXME $ export AWS_SECRET_ACCESS_KEY=FIXME $ /path/to/bin/s3-parallel-put \ --bucket=my-bucket \ --host=s3.amazonaws.com \ --put=stupid \ --insecure \ --dry-run --limit=1 \ . [..] INFO:s3-parallel-put[putter-21714]:./yadt/profile.Profile/image/circle/807.jpeg -> yadt/profile.Profile/image/circle/807.jpeg [..]
  1. Remove --dry-run --limit=1, and let it roll.
Categories: LUG Community Blogs

Jonathan McDowell: Cup!

Thu, 08/01/2015 - 13:58

I got a belated Christmas present today. Thanks Jo + Simon!

Categories: LUG Community Blogs

MJ Ray: Social Network Wishlist

Thu, 08/01/2015 - 04:10

All I want for 2015 is a Free/Open Source Software social network which is:

  • easy to register on (no reCaptcha disability-discriminator or similar, a simple openID, activation emails that actually arrive);
  • has an email help address or online support or phone number or something other than the website which can be used if the registration system causes a problem;
  • can email when things happen that I might be interested in;
  • can email me summaries of what’s happened last week/month in case they don’t know what they’re interested in;
  • doesn’t email me too much (but this is rare);
  • interacts well with other websites (allows long-term members to post links, sends trackbacks or pingbacks to let the remote site know we’re talking about them, makes it easy for us to dent/tweet/link to the forum nicely, and so on);
  • isn’t full of spam (has limits on link-posting, moderators are contactable/accountable and so on, and the software gives them decent anti-spam tools);
  • lets me back up my data;
  • is friendly and welcoming and trolls are kept in check.

Is this too much to ask for? Does it exist already?

Categories: LUG Community Blogs

Chris Lamb: Web scraping: Let's move on

Wed, 07/01/2015 - 18:53

Every few days, someone publishes a new guide, tutorial, library or framework about web scraping, the practice of extracting information from websites where an API is either not provided or is otherwise incomplete.

However, I find these resources fundamentally deceptive — the arduous parts of "real world" scraping simply aren't in the parsing and extraction of data from the target page, the typical focus of these articles.

The difficulties are invariably in "post-processing"; working around incomplete data on the page, handling errors gracefully and retrying in some (but not all) situations, keeping on top of layout/URL/data changes to the target site, not hitting your target site too often, logging into the target site if necessary and rotating credentials and IP addresses, respecting robots.txt, target site being utterly braindead, keeping users meaningfully informed of scraping progress if they are waiting of it, target site adding and removing data resulting in a null-leaning database schema, sane parallelisation in the presence of prioritisation of important requests, difficulties in monitoring a scraping system due to its implicitly non-deterministic nature, and general problems associated with long-running background processes in web stacks.

Et cetera.

In other words, extracting the right text on the page is the easiest and trivial part by far, with little practical difference between an admittedly cute jQuery-esque parsing library or even just using a blunt regular expression.

It would be quixotic to simply retort that sites should provide "proper" APIs but I would love to see more attempts at solutions that go beyond the superficial.

Categories: LUG Community Blogs

Steve Engledow (stilvoid): Pinally

Wed, 07/01/2015 - 00:03

I've finally found a real, practical use for my Raspberry Pi: it's an always-on machine that I can ssh to and use to wake up my home media server :)

It doubles as yet another syncthing client to add to my set.

And of course, it's running Arch ;)

Categories: LUG Community Blogs

Steve Engledow (stilvoid): Keychain and GnuPG &gt;= 2.1

Fri, 02/01/2015 - 15:48

A while ago, I started using keychain to manage my ssh and gpg agents. I did this with the following in my .bashrc

# Start ssh-agent eval $(keychain --quiet --eval id_rsa)

Recently, arch updated gpg to version 2.1.1 which, as per the announcement, no longer requires the GPG_AGENT_INFO environment variable.

Unfortunately, tools like keychain don't know about that and still expect it to be set, leading to some annoying breakage.

My fix is a quick and dirty one; I appended the following to .bashrc

export GPG_AGENT_INFO=~/.gnupg/S.gpg-agent:$(pidof gpg-agent):1

:)

Categories: LUG Community Blogs

Steve Engledow (stilvoid): TODO

Fri, 02/01/2015 - 12:58

Here's this year's TODO diff as compared with last year.

New Year's Resolutions
  • Read even more (fiction and non-fiction)

  • Write at least one short story

  • Write some more games

  • Go horse riding

  • Learn some more turkish

  • Play a lot more guitar

    I did this but want to do more!

  • Lose at least a stone (in weight, from myself)

    I almost did this and then put it all back on again

  • Try to be less of a pedant (except when it's funny)

  • Try to be more funny ;)

    I'm sure I've achieved these ;)

  • Receive a lot less email

    In particularly, unsubscrive from things I don't read

  • Blog more

    Particularly about technical subjects

  • Write more software

  • Release more software

  • Be a better husband and father

    I think I'm doing alright but I'm sure I can do better

  • Improve or replace the engine I use for my blog

Categories: LUG Community Blogs