Planet HantsLUG

Syndicate content
Planet HantsLUG - http://hantslug.org.uk/planet/
Updated: 38 min 55 sec ago

Andy Smith: XFS, Reflinks and Deduplication

Tue, 10/01/2017 - 20:45
btrfs Past

This post is about XFS but it’s about features that first hit Linux in btrfs, so we need to talk about btrfs for a bit first.

For a long time now, btrfs has had a useful feature called reflinks. Basically this is exposed as cp --reflink=always and takes advantage of extents and copy-on-write in order to do a quick copy of data by merely adding another reference to the extents that the data is currently using, rather than having to read all the data and write it out again, as would be the case in other filesystems.

Here’s an excerpt from the man page for cp:

When –reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails, or if –reflink=auto is specified, fall back to a standard copy.

Without reflinks a common technique for making a quick copy of a file is the hardlink. Hardlinks have a number of disadvantages though, mainly due to the fact that since there is only one inode all hardlinked copies must have the same metadata (owner, group, permissions, etc.). Software that might modify the files also needs to be aware of hardlinks: naive modification of a hardlinked file modifies all copies of the file.

With reflinks, life becomes much easier:

  • Each copy has its own inode so can have different metadata. Only the data extents are shared.
  • The filesystem ensures that any write causes a copy-on-write, so applications don’t need to do anything special.
  • Space is saved on a per-extent basis so changing one extent still allows all the other extents to remain shared. A change to a hardlinked file requires a new copy of the whole file.

Another feature that extents and copy-on-write allow is block-level out-of-band deduplication.

  • Deduplication – the technique of finding and removing duplicate copies of data.
  • Block-level – operating on the blocks of data on storage, not just whole files.
  • Out-of-band – something that happens only when triggered or scheduled, not automatically as part of the normal operation of the filesystem.

btrfs has an ioctl that a userspace program can use—presumably after finding a sequence of blocks that are identical—to tell the kernel to turn one into a reference to the other, thus saving some space.

It’s necessary that the kernel does it so that any IO that may be going on at the same time that may modify the data can be dealt with. Modifications after the data is reflinked will just case a copy-on-write. If you tried to do it all in a userspace app then you’d risk something else modifying the files at the same time, but by having the kernel do it then in theory it becomes completely safe to do it at any time. The kernel also checks that the sequence of extents really are identical.

In-band deduplication is a feature that’s being worked on in btrfs. It already exists in ZFS though, and there is it rarely recommended for use as it requires a huge amount of memory for keeping hashes of data that has been written. It’s going to be the same story with btrfs, so out-of-band deduplication is still something that will remain useful. And it exists as a feature right now, which is always a bonus.

XFS Future

So what has all this got to do with XFS?

Well, in recognition that there might be more than one Linux filesystem with extents and so that reflinks might be more generally useful, the extent-same ioctl got lifted up to be in the VFS layer of the kernel instead of just in btrfs. And the good news is that XFS recently became able to make use of it.

When I say “recently” I do mean really recently. I mean like kernel release 4.9.1 which came out on 2017-01-04. At the moment it comes with massive EXPERIMENTAL warnings, requires a new filesystem to be created with a special format option, and will need an xfsprogs compiled from recent git in order to have a mkfs.xfs that can create such a filesystem.

So before going further, I’m going to assume you’ve compiled a new enough kernel and booted into it, then compiled up a new enough xfsprogs. Both of these are quite simple things to do, for example the Debian documentation for building kernel packages from upstream code works fine.

XFS Reflink Demo

Make yourself a new filesystem, with the reflink=1 format option.

# mkfs.xfs -L reflinkdemo -m reflink=1 /dev/xvdc meta-data=/dev/xvdc isize=512 agcount=4, agsize=3276800 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=1 data = bsize=4096 blocks=13107200, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=6400, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

Put it in /etc/fstab for convenience, and mount it somewhere.

# echo "LABEL=reflinkdemo /mnt/xfs xfs relatime 0 2" >> /etc/fstab # mkdir -vp /mnt/xfs mkdir: created directory ‘/mnt/xfs’ # mount /mnt/xfs # df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 339M 50G 1% /mnt/xfs

Create a few files with random data.

# mkdir -vp /mnt/xfs/reflink mkdir: created directory ‘/mnt/xfs/reflink’ # chown -c andy: /mnt/xfs/reflink changed ownership of ‘/mnt/xfs/reflink’ from root:root to andy:andy # exit $ for i in {1..5}; do > echo "Writing $i…"; dd if=/dev/urandom of=/mnt/xfs/reflink/$i bs=1M count=1024; > done Writing 1… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.34193 s, 247 MB/s Writing 2… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.33207 s, 248 MB/s Writing 3… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.33527 s, 248 MB/s Writing 4… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.33362 s, 248 MB/s Writing 5… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.32859 s, 248 MB/s $ df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 5.4G 45G 11% /mnt/xfs $ du -csh /mnt/xfs 5.0G /mnt/xfs 5.0G total

Copy a file and as expected usage will go up by 1GiB. And it will take a little while, even on my nice fast SSDs.

$ time cp -v /mnt/xfs/reflink/{,copy_}1 ‘/mnt/xfs/reflink/1’ -> ‘/mnt/xfs/reflink/copy_1’   real 0m3.420s user 0m0.008s sys 0m0.676s $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 6.4G 44G 13% /mnt/xfs 6.0G /mnt/xfs/reflink 6.0G total

So what about a reflink copy?

$ time cp -v --reflink=always /mnt/xfs/reflink/{,reflink_}1 ‘/mnt/xfs/reflink/1’ -> ‘/mnt/xfs/reflink/reflink_1’   real 0m0.003s user 0m0.000s sys 0m0.004s $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 6.4G 44G 13% /mnt/xfs 7.0G /mnt/xfs/reflink 7.0G total

The apparent usage went up by 1GiB but the amount of free space as shown by df stayed the same. No more actual storage was used because the new copy is a reflink. And the copy got done in 4ms as opposed to 3,420ms.

Can we tell more about how these files are laid out? Yes, we can use the filefrag -v command to tell us more.

$ filefrag -v /mnt/xfs/reflink/{,copy_,reflink_}1 Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/copy_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 917508.. 1179651: 262144: last,eof /mnt/xfs/reflink/copy_1: 1 extent found File size of /mnt/xfs/reflink/reflink_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/reflink_1: 1 extent found

What we can see here is that all three files are composed of a single extent which is 262,144 4KiB blocks in size, but it also tells us that /mnt/xfs/reflink/1 and /mnt/xfs/reflink/reflink_1 are using the same range of physical blocks: 1572884..1835027.

XFS Deduplication Demo

We’ve demonstrated that you can use cp --reflink=always to take a cheap copy of your data, but what about data that may already be duplicates without your knowledge? Is there any way to take advantage of the extent-same ioctl for deduplication?

There’s a couple of software solutions for out-of-band deduplication in btrfs, but one I know that works also in XFS is duperemove. You will need to use a git checkout of duperemove for this to work.

A quick reminder of the storage use before we start.

$ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 6.4G 44G 13% /mnt/xfs 7.0G /mnt/xfs/reflink 7.0G total $ filefrag -v /mnt/xfs/reflink/{,copy_,reflink_}1 Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/copy_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 917508.. 1179651: 262144: last,eof /mnt/xfs/reflink/copy_1: 1 extent found File size of /mnt/xfs/reflink/reflink_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/reflink_1: 1 extent found

Run duperemove.

# duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/reflink Using 128K blocks Using hash: murmur3 Gathering file list... Adding files from database for hashing. Loading only duplicated hashes from hashfile. Using 2 threads for dedupe phase Kernel processed data (excludes target files): 4.0G Comparison of extent info shows a net change in shared extents of: 1.0G $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 5.4G 45G 11% /mnt/xfs 7.0G /mnt/xfs/reflink 7.0G total $ filefrag -v /mnt/xfs/reflink/{,copy_,reflink_}1 Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/copy_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/copy_1: 1 extent found File size of /mnt/xfs/reflink/reflink_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/reflink_1: 1 extent found

The output of du remained the same, but df says that there’s now 1GiB more free space, and filefrag confirms that what’s changed is that copy_1 now uses the same extents as 1 and reflink_1. The duplicate data in copy_1 that in theory we did not know was there, has been discovered and safely reference-linked to the extent from 1, saving us 1GiB of storage.

By the way, I told duperemove to use a hash file because otherwise it will keep that in RAM. For the sake of 7 files that won’t matter but it will if I have millions of files so it’s a habit I get into. It uses that hash file to avoid having to repeatedly re-hash files that haven’t changed.

All that has been demonstrated so far though is whole-file deduplication, as copy_1 was just a regular copy of 1. What about when a file is only partially composed of duplicate data? Well okay.

$ cat /mnt/xfs/reflink/{1,2} > /mnt/xfs/reflink/1_2 $ ls -lah /mnt/xfs/reflink/{1,2,1_2} -rw-r--r-- 1 andy andy 1.0G Jan 10 15:41 /mnt/xfs/reflink/1 -rw-r--r-- 1 andy andy 2.0G Jan 10 16:55 /mnt/xfs/reflink/1_2 -rw-r--r-- 1 andy andy 1.0G Jan 10 15:41 /mnt/xfs/reflink/2 $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 7.4G 43G 15% /mnt/xfs 9.0G /mnt/xfs/reflink 9.0G total $ filefrag -v /mnt/xfs/reflink/{1,2,1_2} Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/2 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262127: 20.. 262147: 262128: 1: 262128.. 262143: 2129908.. 2129923: 16: 262148: last,eof /mnt/xfs/reflink/2: 2 extents found File size of /mnt/xfs/reflink/1_2 is 2147483648 (524288 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262127: 262164.. 524291: 262128: 1: 262128.. 524287: 655380.. 917539: 262160: 524292: last,eof /mnt/xfs/reflink/1_2: 2 extents found

I’ve concatenated 1 and 2 together into a file called 1_2 and as expected, usage goes up by 2GiB. filefrag confirms that the physical extents in 1_2 are new. We should be able to do better because this 1_2 file does not contain any new unique data.

$ duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/reflink Using 128K blocks Using hash: murmur3 Gathering file list... Adding files from database for hashing. Using 2 threads for file hashing phase Kernel processed data (excludes target files): 4.0G Comparison of extent info shows a net change in shared extents of: 3.0G $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 5.4G 45G 11% /mnt/xfs 9.0G /mnt/xfs/reflink 9.0G total

We can. Apparent usage stays at 9GiB but real usage went back to 5.4GiB which is where we were before we created 1_2.

And the physical layout of the files?

$ filefrag -v /mnt/xfs/reflink/{1,2,1_2} Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/2 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262127: 20.. 262147: 262128: shared 1: 262128.. 262143: 2129908.. 2129923: 16: 262148: last,shared,eof /mnt/xfs/reflink/2: 2 extents found File size of /mnt/xfs/reflink/1_2 is 2147483648 (524288 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: shared 1: 262144.. 524271: 20.. 262147: 262128: 1835028: shared 2: 524272.. 524287: 2129908.. 2129923: 16: 262148: last,shared,eof /mnt/xfs/reflink/1_2: 3 extents found

It shows that 1_2 is now made up from the same extents as 1 and 2 combined, as expected.

Less of the urandom

These synthetic demonstrations using a handful of 1GiB blobs of data from /dev/urandom are all very well, but what about something a little more like the real world?

Okay well let’s see what happens when I take ~30GiB of backup data created by rsnapshot on another host.

rsnapshot is a backup program which makes heavy use of hardlinks. It runs periodically and compares the previous backup data with the new. If they are identical then instead of storing an identical copy it makes a hardlink. This saves a lot of space but does have a lot of limitations as discussed previously.

This won’t be the best example because in some ways there is expected to be more duplication; this data is composed of multiple backups of the same file trees. But on the other hand there shouldn’t be as much because any truly identical files have already been hardlinked together by rsnapshot. But it is a convenient source of real-world data.

So, starting state:

(I deleted all the reflink files)

$ df -h /mnt/xfs; sudo du -csh /mnt/xfs/rsnapshot Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 30G 21G 59% /mnt/xfs 29G /mnt/xfs/rsnapshot 29G total

A small diversion about how rsnapshot lays out its backups may be useful here. They are stored like this:

  • rsnapshot_root / [iteration a] / [client foo] / [directory structure from client foo]
  • rsnapshot_root / [iteration a] / [client bar] / [directory structure from client bar]
  • rsnapshot_root / [iteration b] / [client foo] / [directory structure from client foo]
  • rsnapshot_root / [iteration b] / [client bar] / [directory structure from client bar]

The iterations are commonly things like daily.0, daily.1daily.6. As a consequence, the paths:

rsnapshot/daily.*/client_foo

would be backups only from host foo, and:

rsnapshot/daily.0/*

would be backups from all hosts but only the most recent daily sync.

Let’s first see what the savings would be like in looking for duplicates in just one client’s backups.

Here’s the backups I have in this blob of data. The names of the clients are completely made up, though they are real backups.

Client Size (MiB) darbee 14,504 achorn 11,297 spader 2,612 reilly 2,276 chino 2,203 audun 2,184

So let’s try deduplicating all of the biggest one’s—darbee‘s—backups:

$ df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 30G 21G 59% /mnt/xfs # time duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/rsnapshot/*/darbee Using 128K blocks Using hash: murmur3 Gathering file list... Kernel processed data (excludes target files): 8.8G Comparison of extent info shows a net change in shared extents of: 6.8G 9.85user 78.70system 3:27.23elapsed 42%CPU (0avgtext+0avgdata 23384maxresident)k 50703656inputs+790184outputs (15major+20912minor)pagefaults 0swaps $ df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 25G 26G 50% /mnt/xfs

3m27s of run time, somewhere between 5 and 6.8GiB saved. That’s 35%!

Now to deduplicate the lot.

# time duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/rsnapshot Using 128K blocks Using hash: murmur3 Gathering file list... Kernel processed data (excludes target files): 5.4G Comparison of extent info shows a net change in shared extents of: 3.4G 29.12user 188.08system 5:02.31elapsed 71%CPU (0avgtext+0avgdata 34040maxresident)k 34978360inputs+572128outputs (18major+45094minor)pagefaults 0swaps $ df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 23G 28G 45% /mnt/xfs

5m02 used this time, and another 2–3.4G saved.

Since the actual deduplication does take some time (the kernel having to read the extents, mainly), and most of it was already done in the first pass, a full pass would more likely take the sum of the times, i.e. more like 8m29s.

Still, a total of about 7GiB was saved which is 23%.

It would be very interesting to try this on one of my much larger backup stores.

Why Not Just Use btrfs?

Using a filesystem that already has all of these features would certainly seem easier, but I personally don’t think btrfs is stable enough yet. I use it at home in a relatively unexciting setup (8 devices, raid1 for data and metadata, no compression or deduplication) and I wish I didn’t. I wouldn’t dream of using it in a production environment yet.

I’m on the btrfs mailing list and there are way too many posts regarding filesystems that give ENOSPC and become unavailable for writes, or systems that were unexpectedly powered off and when powered back on the btrfs filesystem is completely lost.

I expect the reflink feature in XFS to become non-experimental before btrfs is stable enough for production use.

ZFS?

ZFS is great. It doesn’t have out-of-band deduplication or reflinks though, and they don’t plan to any time soon.

Categories: LUG Community Blogs

Debian Bits: New Debian Developers and Maintainers (November and December 2016)

Sun, 08/01/2017 - 23:30

The following contributors got their Debian Developer accounts in the last two months:

  • Karen M Sandler (karen)
  • Sebastien Badia (sbadia)
  • Christos Trochalakis (ctrochalakis)
  • Adrian Bunk (bunk)
  • Michael Lustfield (mtecknology)
  • James Clarke (jrtc27)
  • Sean Whitton (spwhitton)
  • Jerome Georges Benoit (calculus)
  • Daniel Lange (dlange)
  • Christoph Biedl (cbiedl)
  • Gustavo Panizzo (gefa)
  • Gert Wollny (gewo)
  • Benjamin Barenblat (bbaren)
  • Giovani Augusto Ferreira (giovani)
  • Mechtilde Stehmann (mechtilde)
  • Christopher Stuart Hoskin (mans0954)

The following contributors were added as Debian Maintainers in the last two months:

  • Dmitry Bogatov
  • Dominik George
  • Gordon Ball
  • Sruthi Chandran
  • Michael Shuler
  • Filip Pytloun
  • Mario Anthony Limonciello
  • Julien Puydt
  • Nicholas D Steeves
  • Raoul Snyman

Congratulations!

Categories: LUG Community Blogs

Steve Kemp: Patching scp and other updates.

Sun, 08/01/2017 - 16:39

I use openssh every day, be it the ssh command for connecting to remote hosts, or the scp command for uploading/downloading files.

Once a day, or more, I forget that scp uses the non-obvious -P flag for specifying the port, not the -p flag that ssh uses.

Enough is enough. I shall not file a bug report against the Debian openssh-client page, because no doubt compatibility with both upstream, and other distributions, is important. But damnit I've had enough.

apt-get source openssh-client shows the appropriate code:

fflag = tflag = 0; while ((ch = getopt(argc, argv, "dfl:prtvBCc:i:P:q12346S:o:F:")) != -1) switch (ch) { .. .. case 'P': addargs(&remote_remote_args, "-p"); addargs(&remote_remote_args, "%s", optarg); addargs(&args, "-p"); addargs(&args, "%s", optarg); break; .. .. case 'p': pflag = 1; break; .. .. ..

Swapping those two flags around, and updating the format string appropriately, was sufficient to do the necessary.

In other news I've done some hardware development, using both Arduino boards and the WeMos D1-mini. I'm still at the stage where I'm flashing lights, and doing similarly trivial things:

I have more complex projects planned for the future, but these are on-hold until the appropriate parts are delivered:

  • MP3 playback.
  • Bluetooth-speakers.
  • Washing machine alarm.
  • LCD clock, with time set by NTP, and relay control.

Even with a few LEDs though I've had fun, for example writing a trivial binary display.

Categories: LUG Community Blogs

Steve Kemp: So I'm gonna start doing arduino-things

Sat, 31/12/2016 - 19:11

Since I've got a few weeks off I've decided I need to find a project, or two, to occupy me. Happily the baby is settling in well, mostly he sleeps for 4-5 hours, then eats, before the cycle repeats. It could have been so much worse.

My plan is to start exploring Arduino-related projects. It has been years since I touched hardware, with the exception of building a new PC for myself every 12-48 months.

There are a few "starter kits" you can buy, consisting of a board, and some discrete components such as a bunch of buttons, an LCD-output screen, some sensors (pressure, water, tilt), etc.

There are also some nifty little pre-cooked components you can buy such as:

The appeal of the former is that I can get the hang of marrying hardware with software, and the appeal of the latter is that the whole thing is pre-built, so I don't need to worry about anything complex. Looking over similar builds people have made, the process is more akin to building with Lego than real hardware-assembling.

So, for the next few weeks my plan is to :

  • Explore the various sensors, and tutorials, via the starter-kit.
  • Wire the MP3-playback device to a wireless D1-mini-board.
    • Which will allow me to listen to (static) music stored on an SD-card.
    • And sending "next", "previous", "play", "volume-up", etc, via a mobile.

The end result should be that I will be able to listen to music in my living room. Albeit in a constrained fashion (if I want to change the music I'll have to swap out the files on the SD-card). But it's something that's vaguely useful, and something that I think is within my capability, even as a beginner.

I'm actually not sure what else I could usefully do, but I figured I could probably wire up a vibration sensor to another wireless board. The device can sit on the top of my washing machine:

  • If vibration is sensed move into the "washing is on" state.
    • If vibration stops after a few minutes move into the "washing machine done" state.
      • Send a HTTP GET-request, which will trigger an SMS/similar.

There's probably more to it than that, but I expect that a simple vibration sensor will be sufficient to allow me to get an alert of some kind when the washing machine is ready to be emptied - and I don't need to poke inside the guts of the washing machine, nor hang reed-switches off the door, etc.

Anyway the only downside to my plan is that no doubt shipping the toys from AliExpress will take 2-4 weeks. Oops.

Categories: LUG Community Blogs

Steve Kemp: I finally made something worthwhile.

Mon, 26/12/2016 - 09:34

So for once I made something useful.

Oiva Adam Kemp.

Happy Christmas, if you believe in that kind of thing.

Categories: LUG Community Blogs

Debian Bits: Free FPGA programming with Debian

Thu, 22/12/2016 - 17:15

FPGA (Field Programmable Gate Array) are increasingly popular for data acquisition, device control and application acceleration. Debian now features a completely Free set of tools to program FPGA in Verilog, prepare the binary and have it executed on an affordable device.

See http://wiki.debian.org/FPGA/Lattice for details. Readers familiar with the technology may rightly guess that this refers to the yosys package together with berkeley-abc, arachne-"Place-and-Route" and the icestorm tools to communicate with the device.

The packages have been contributed by the Debian Science team.

We hope this effort to support the FPGA community to collect an increasing number of skills to further smoothen the Open Source experience and lower the entry barriers for this tantalising technology.

Categories: LUG Community Blogs

Steve Kemp: A simple Perl alternative to storing data in Redis

Thu, 15/12/2016 - 22:00

I continue to be a big user of Perl, and for many of my sites I avoid the use of MySQL which means that I largely store data in flat files, SQLite databases, or in memory via Redis.

One of my servers was recently struggling with RAM, and the suprising cause was "too much data" in Redis. (Surprising because I'd not been paying attention and seen how popular it was, and also because ASCII text compresses pretty well).

Read/Write speed isn't a real concern, so I figured I'd move the data into an SQLite database, but that would require rewriting the application.

The client library for Perl is pretty awesome, and simple usage looks like this:

# Connect to localhost. my $r = Redis->new() # simple storage $r->set( "key", "value" ); # Work with sets $r->sadd( "fruits", "orange" ); $r->sadd( "fruits", "apple" ); $r->sadd( "fruits", "blueberry" ); $r->sadd( "fruits", "banannanananananarama" ); # Show the set-count print "There are " . $r->scard( "fruits" ) . " known fruits"; # Pick a random one print "Here is a random one " . $r->srandmember( "fruits" ) . "\n";

I figured, if I ignored the Lua support and the other more complex operations, creating a compatible API implementation wouldn't be too hard. So rather than porting my application to using SQLite directly I could juse use a different client-library.

In short I change this:

use Redis; my $r = Redis->new();

To this:

use Redis::SQLite; my $r = Redis::SQLite->new();

And everything continues to work. I've implemented all the set-related functions except one, and a random smattering of the other simple operations.

The appropriate test-cases in the Redis client library (i.e. removing all references to things I didn't implement) pass, and my own new tests also make me confident.

It's obviously not a hard job, but it was a quick solution to a real problem and might be useful to others.

My image hosting site, and my markdown sharing site now both use this wrapper and seem to be performing well - but with more free RAM.

No doubt I'll add more of the simple primitives as time goes on, but so far I've done enough to be useful.

Categories: LUG Community Blogs

Andy Smith: Supermicro SATA DOM flash devices don’t report lifetime writes correctly

Sat, 26/11/2016 - 16:43

I’m playing around with a pair of Supermicro SATA DOM flash devices at the moment, evaluating them for use as the operating system storage for servers (as opposed to where customer data goes).

They’re flash devices with a limited write endurance. The smallest model (16GB), for example, is good for 17TB of writes. Therefore it’s important to know how much you’ve actually written to it.

Many SSDs and other flash devices expose the total amount written through the SMART attribute 241, Total_LBAs_Written. The SATA DOM devices do seem to expose this attribute, but right now they say this:

$ for dom in $(sudo lsblk --paths -d -o NAME,MODEL --noheadings | awk '/SATA SSD/ { print $1 }') do echo -n "$dom: " sudo smartctl -A "$dom" | awk '/^241/ { print $10 * 512 * 1.0e-9, "GB" }' done /dev/sda: 0.00856934 GB /dev/sdb: 0.00881715 GB

This being after install and (as of now) more than a week of uptime, ~9MB of lifetime writes isn’t credible.

Another place we can look for amount of bytes written is /proc/diskstats. The 10th column is the number of (512-byte) sectors written, so:

$ for dom in $(sudo lsblk -d -o NAME,MODEL --noheadings | awk '/SATA SSD/ { print $1 }') do awk "/$dom / { print \$3, \$10 / 2 * 1.0e-6, \"GB\" }" /proc/diskstats done sda 3.93009 GB sdb 3.93009 GB

Almost 4GB is a lot more believable, so can we just use /proc/diskstats? Well, the problem there is that those figures are only since boot. That won’t include, for example, all the data written during install.

Okay, so, are these figures even consistent? Let’s write 100MB and see what changes.

Since the figure provided by SMART attribute 241 apparently isn’t actually 512-byte blocks we’ll just print the raw value there.

Before:

$ for dom in $(sudo lsblk -d -o NAME,MODEL --noheadings | awk '/SATA SSD/ { print $1 }') do awk "/$dom / { print \$3, \$10 / 2 * 1.0e-6, \"GB\" }" /proc/diskstats done sda 4.03076 GB sdb 4.03076 GB $ for dom in $(sudo lsblk --paths -d -o NAME,MODEL --noheadings | awk '/SATA SSD/ { print $1 }') do echo -n "$dom: " sudo smartctl -A "$dom" | awk '/^241/ { print $10 }' done /dev/sda: 16835 /dev/sdb: 17318

Write 100MB:

$ dd if=/dev/urandom bs=1MB count=100 > /var/tmp/one_hundred_megabytes 100+0 records in 100+0 records out 100000000 bytes (100 MB) copied, 7.40454 s, 13.5 MB/s

(I used /dev/urandom just in case some compression might take place or something)

After:

$ for dom in $(sudo lsblk -d -o NAME,MODEL --noheadings | awk '/SATA SSD/ { print $1 }') do awk "/$dom / { print \$3, \$10 / 2 * 1.0e-6, \"GB\" }" /proc/diskstats done sda 4.13046 GB sdb 4.13046 GB $ for dom in $(sudo lsblk --paths -d -o NAME,MODEL --noheadings | awk '/SATA SSD/ { print $1 }') do echo -n "$dom: " sudo smartctl -A "$dom" | awk '/^241/ { print $10 }' done /dev/sda: 16932 /dev/sdb: 17416

Well, alright, all is apparently not lost: SMART attribute 241 went up by ~100 and diskstats agrees that ~100MB was written too, so it looks like it does actually report lifetime writes, but it’s reporting them as megabytes (109 bytes), not 512-byte sectors.

Every reference I can find says that Total_LBAs_Written is the number of 512-byte sectors, though, so in reporting units of 1MB I feel that these devices are doing the wrong thing.

Anyway, I’m a little alarmed that ~0.1% of the lifetime has gone already, although a lot of that would have been the install. I probably should take this opportunity to get rid of a lot of writes by tracking down logging of mundane garbage. Also this is the smallest model; the devices are rated for 1 DWPD so just over-provisioning by using a larger model than necessary will help.

Categories: LUG Community Blogs

Steve Kemp: Detecting fraudulent signups?

Mon, 21/11/2016 - 05:37

I run a couple of different sites that allow users to sign-up and use various services. In each of these sites I have some minimal rules in place to detect bad signups, but these are a little ad hoc, because the nature of "badness" varies on a per-site basis.

I've worked in a couple of places where there are in-house tests of bad signups, and these usually boil down to some naive, and overly-broad, rules:

  • Does the phone numbers' (international) prefix match the country of the user?
  • Does the postal address supplied even exist?

Some places penalise users based upon location too:

  • Does the IP address the user submitted from come from TOR?
  • Does the geo-IP country match the users' stated location?
  • Is the email address provided by a "free" provider?

At the moment I've got a simple HTTP-server which receives a JSON post of a new users' details, and returns "200 OK" or "403 Forbidden" based on some very very simple critereon. This is modeled on the spam detection service for blog-comments server I use - something that is itself becoming less useful over time. (Perhaps time to kill that? A decision for another day.)

Unfortunately this whole approach is very reactive, as it takes human eyeballs to detect new classes of problems. Code can't guess in advance that it should block usernames which could collide with official ones, for example allowing a username of "admin", "help", or "support".

I'm certain that these systems have been written a thousand times, as I've seen at least five such systems, and they're all very similar. The biggest flaw in all these systems is that they try to classify users in advance of them doing anything. We're trying to say "Block users who will use stolen credit cards", or "Block users who'll submit spam", by correlating that behaviour with other things. In an ideal world you'd judge users only by the actions they take, not how they signed up. And yet .. it is better than nothing.

For the moment I'm continuing to try to make the best of things, at least by centralising the rules for myself I cut down on duplicate code. I'll pretend I'm being cool, modern, and sexy, and call this a micro-service! (Ignore the lack of containers for the moment!)

Categories: LUG Community Blogs

Debian Bits: Debian Contributors Survey 2016

Wed, 16/11/2016 - 14:45

The Debian Contributor Survey launched last week!

In order to better understand and document who contributes to Debian, we (Mathieu ONeil, Molly de Blanc, and Stefano Zacchiroli) have created this survey to capture the current state of participation in the Debian Project through the lense of common demographics. We hope a general survey will become an annual effort, and that each year there will also be a focus on a specific aspect of the project or community. The 2016 edition contains sections concerning work, employment, and labour issues in order to learn about who is getting paid to work on and with Debian, and how those relationships affect contributions.

We want to hear from as many Debian contributors as possible—whether you've submitted a bug report, attended a DebConf, reviewed translations, maintain packages, participated in Debian teams, or are a Debian Developer. Completing the survey should take 10-30 minutes, depending on your current involvement with the project and employment status.

In an effort to reflect our own ideals as well as those of the Debian project, we are using LimeSurvey, an entirely free software survey tool, in an instance of it hosted by the LimeSurvey developers.

Survey responses are anonymous, IP and HTTP information are not logged, and all questions are optional. As it is still likely possible to determine who a respondent is based on their answers, results will only be distributed in aggregate form, in a way that does not allow deanonymization. The results of the survey will be analyzed as part of ongoing research work by the organizers. A report discussing the results will be published under a DFSG-free license and distributed to the Debian community as soon as it's ready. The raw, disaggregated answers will not be distributed and will be kept under the responsibility of the organizers.

We hope you will fill out the Debian Contributor Survey. The deadline for participation is: 4 December 2016, at 23:59 UTC.

If you have any questions, don't hesitate to contact us via email at:

Categories: LUG Community Blogs

Debian Bits: New Debian Developers and Maintainers (September and October 2016)

Thu, 03/11/2016 - 11:00

The following contributors got their Debian Developer accounts in the last two months:

  • Adriano Rafael Gomes (adrianorg)
  • Arturo Borrero González (arturo)
  • Sandro Knauß (hefee)

The following contributors were added as Debian Maintainers in the last two months:

  • Abhijith PA
  • Mo Zhou
  • Víctor Cuadrado Juan
  • Zygmunt Bazyli Krynicki
  • Robert Haist
  • Sunil Mohan Adapa
  • Elena Grandi
  • Eric Heintzmann
  • Dylan Aïssi
  • Daniel Shahaf
  • Samuel Henrique
  • Kai-Chung Yan
  • Tino Mettler

Congratulations!

Categories: LUG Community Blogs

Debian Bits: "softWaves" will be the default theme for Debian 9

Tue, 25/10/2016 - 17:50

The theme "softWaves" by Juliette Taka Belin has been selected as default theme for Debian 9 'stretch'.

After the Debian Desktop Team made the call for proposing themes, a total of twelve choices have been submitted, and any Debian contributor has received the opportunity to vote on them in a survey. We received 3,479 responses ranking the different choices, and softWaves has been the winner among them.

We'd like to thank all the designers that have participated providing nice wallpapers and artwork for Debian 9, and encourage everybody interested in this area of Debian, to join the Design Team. It is being considered to package all of them so they are easily available in Debian. If you want to help in this effort, or package any other artwork (for example, particularly designed to be accessibility-friendly), please contact the Debian Desktop Team, but hurry up, because the freeze for new packages in the next release of Debian starts on January 5th, 2017.

This is the second time that Debian ships a theme by Juliette Belin, who also created the theme "Lines" that enhances our actual stable release, Debian 8. Congratulations, Juliette, and thank you very much for your continued commitment to Debian!

Categories: LUG Community Blogs

Steve Kemp: This blog has moved

Mon, 17/10/2016 - 10:40
This blog has moved to https://blog.steve.fi/. Please update to use the new feed location.
Categories: LUG Community Blogs

Steve Kemp: This blog has moved

Sun, 16/10/2016 - 18:30
This blog has moved to https://blog.steve.fi/. Please update to use the new feed location.
Categories: LUG Community Blogs

Steve Kemp: This blog has moved

Sat, 15/10/2016 - 18:30
This blog has moved to https://blog.steve.fi/. Please update to use the new feed location.
Categories: LUG Community Blogs

Steve Kemp: This blog has moved

Fri, 14/10/2016 - 18:30
This blog has moved to https://blog.steve.fi/. Please update to use the new feed location.
Categories: LUG Community Blogs

Steve Kemp: This blog has moved

Thu, 13/10/2016 - 18:30
This blog has moved to https://blog.steve.fi/. Please update to use the new feed location.
Categories: LUG Community Blogs

Steve Kemp: This blog has moved

Wed, 12/10/2016 - 18:30
This blog has moved to https://blog.steve.fi/. Please update to use the new feed location.
Categories: LUG Community Blogs

Steve Kemp: This blog has moved

Tue, 11/10/2016 - 18:30
This blog has moved to https://blog.steve.fi/. Please update to use the new feed location.
Categories: LUG Community Blogs

Steve Kemp: This blog has moved

Mon, 10/10/2016 - 18:30
This blog has moved to https://blog.steve.fi/. Please update to use the new feed location.
Categories: LUG Community Blogs