Update: It turns out the mysterious iOS device was due to the fact I was using purple-hangouts to connect to Google’s chat service. Since it uses undocumented APIs, it must identify itself as an iOS device. When I revoke the iOS device, my chat client disconnects and I am required to re-authenticate. I’m guessing the YouTube plays problem stem from a current issue about paused videos randomly starting in background tabs which I have experienced.

The other day I was searching through my YouTube history and discovered a ton of garbage pop music videos that I’ve never viewed. I always turn auto-play off, so the presence of these videos in my history was puzzling. I immediately checked my Google account history to look for unauthorized access. Within the list was an iOS device. I haven’t owned any Apple productions since my MacBook was stolen two years ago, so I immediately revoked access to that account and changed my password.

Crappy pop music in my YouTube history that I've never played

I use a secure password algorthym and my Google password isn’t used for any other sites. It’s not written down anywhere or stored digitally on any of my systems. Could I have inadvertently entered it into a phishing site without realizing it? Another thing that is puzzling stems from a security mechanism Google has that e-mails users whenever they add a device. I verified that I have received an e-mail every time I’ve re-flashed or added a new device to my Google account for at least the past year:

E-mail Google notifications for account changes

Yet there is no e-mail for this mysterious iOS device. There isn’t even a suspicious login attempt e-mail. I currently don’t have two factor authentication, but I’m curious if this device would have still been connected if two factor was enabled.

I haven’t yet re-authorized any of my Android devices. Although one is on the latest Cyanogen, the other has a locked boot loader and is dependent on the few and far between updates from the manufacturer. Because Android has no real package management and its system data is stored on a read-only system partition, security updates require considerably more work than on other operating systems and manufactures are known to leave unpatched versions of glibc, openssl and the built-in web browser in the wild for months, if not forever.

There is a possibility one of my Android devices could have been compromised, but that should have only given an attacker access to that device’s security token, not the ability to add new devices, and certainly not the ability to add devices without a notification e-mail.

Although a paid Google Apps for Business subscription would potentially give me a full audit history to see the potential damage, a free personal Google account does not. My location history, device information history and voice/audio activity logging have all been disabled in Google’s account management. The mysterious You Tube videos, over 70 of them, start May 4th at 12:30pm and ending at 5:57pm. (No timezone information is given). There is nothing unexpected in my Google Search history for that time period and nothing odd on my Google payment history.

I know what you’re thinking. I had auto-play enabled, someone sent me a link for a garbage shitty pop song and I started listening and walked away from my laptop. That would make the most sense, and I’d be willing to entertain the idea if it wasn’t for the fact that I didn’t recognize the first mysterious video, auto-play is disabled on my account and that I had a mysterious iOS device authorized.

I do run my own e-mail server, so it is possible my password was compromised and I simply missed the new device e-mails. I find this unlikely since my e-mail server has significantly lower false positives for spam than G-mail’s. The lack of an e-mail for the iOS device is what disturbs me the most. It raises my suspicions that this particular attack may have been accomplished using a vulnerability within Google’s architecture and not directly on my account credentials.

If this was an attack, the results seem odd. A Gmail account wasn’t added (I currently do not have a Gmail account associated with my Google account), no payments were made and I haven’t found any other noticeable signs of activity. So why would an attacker use my account to simply play YouTube videos?

These were all monetized music videos with literally millions of views. Access to real accounts (versus accounts created just for spam) could allow an attacker to generate revenue off those videos while avoiding Google’s detection system for view generating robots. The content creators may not even be aware that hacked accounts are being used to inflate their revenues. They may simply have a contract with a marketing company that promises video impressions, which accomplishes it via exploited accounts.

Whatever the situation may be, it’s impossible to find out for sure. Google’s account management interface lets me see when a device last accessed my account, but doesn’t show me when that device was added or give me an audit trail of what was done under a given connection. Unless I can report a specific vulnerability, the standard security and product forums simply direct users to remove untrusted devices and change their passwords.

When it comes to most general purpose operating systems, including Windows, Mac OS X and many Linux desktop distribution, an end user can wipe a device and reinstall that operating system from scratch. So long as the hardware is supported, or has available device drivers, the machine can work with a stock version of the operating system. When it comes to embedded systems, many use firmware, a combination of an operating system and applications, typically stored on read-only storage, and tailored specifically for a device; hardwired for a limited set of functionality.

When Google originally purchased Android Inc in 2005, their development and releases of Android for cellphones was treated more like firmware than a general purpose operating system. As Android has grown, manufactures use the Android Open Source Project (ASOP) as a base, modifying it for each of their individual headsets. The result is that users are now dependent on operating system updates from each manufacturer, leaving many devices with obsolete versions of software or worse, major unpatched security vulnerabilities. This is what’s known as Android fragmentation.

On an Android device, the standard partition layout consists of the following six partitions: boot, system, recovery, data, cache, misc. The following is a partition layout from an Android device:

## ls -al /dev/block/platform/msm_sdcc.1/by-name
... DDR -> /dev/block/mmcblk0p17
... FOTAKernel -> /dev/block/mmcblk0p16
... LTALabel -> /dev/block/mmcblk0p18
... TA -> /dev/block/mmcblk0p1
... aboot -> /dev/block/mmcblk0p5
... alt_aboot -> /dev/block/mmcblk0p11
... alt_dbi -> /dev/block/mmcblk0p10
... alt_rpm -> /dev/block/mmcblk0p12
... alt_s1sbl -> /dev/block/mmcblk0p9
... alt_sbl1 -> /dev/block/mmcblk0p8
... alt_tz -> /dev/block/mmcblk0p13
... apps_log -> /dev/block/mmcblk0p22
... boot -> /dev/block/mmcblk0p14
... cache -> /dev/block/mmcblk0p24
... dbi -> /dev/block/mmcblk0p4
... fsg -> /dev/block/mmcblk0p21
... modemst1 -> /dev/block/mmcblk0p19
... modemst2 -> /dev/block/mmcblk0p20
... ramdump -> /dev/block/mmcblk0p15
... rpm -> /dev/block/mmcblk0p6
... s1sbl -> /dev/block/mmcblk0p3
... sbl1 -> /dev/block/mmcblk0p2
... system -> /dev/block/mmcblk0p23
... tz -> /dev/block/mmcblk0p7
... userdata -> /dev/block/mmcblk0p25

As seen above, the actual partition layout can vary greatly from device to device. There are also additional mount points for internal/external sdcards, but for the sake of simplicity, we’re only going to focus on the system, data and recovery partitions.

The system partition is what contains the core of the Android operating system. All the essential libraries, base user interface and non-removable system apps reside on this partition. In the Linux world, this is the root or / partition. Unlike desktop or server Linux systems, this partition is typically read-only in standard Android operation mode. There are other Linux systems with read only root partitions. Many home Internet routers and satellite navigation systems use Linux as their base and have similar partition layouts with non-writable root file systems.

Package Management

On a standard Linux distribution such as Ubuntu, Fedora, Arch or Gentoo, most of the program files on the root partition are maintained by a package manager. The package manager communicates with central repositories in order to install services, programs and security updates. Shared libraries and subsystems are listed as dependencies. When using a package manager to install a package, typically the dependencies will be installed automatically and can be shared between different packages.

An alternative type of package is a monolithic one, where all the dependencies are included in the package itself. Examples include Ubuntu’s Snappy, the standard .app container in Mac OS X, Java web applications (war/ear files) and Android’s apk packages. Including all necessary dependencies can reduce breakage due to incompatibilities and ensure better support. The trade-offs include wasting space by duplicating dependencies and making each package responsible for security updates of embedded libraries.

There’s another class of package known as an installer. Windows users are most familiar with installers which typically update any shared dependencies that are out of date. Some Mac OS X programs use installers as well; typically those that need to install shared libraries, services or require administrator access to the system.

App Stores

Apple’s app store, released originally with their iPhone product, is basically a crippled version of what had already existed in Linux systems for over a decade. It’s a manager that connects to Apple’s repository that contains monolithic packages. Although the repositories for most Linux distributions have a process to get new packages added, most Linux package managers allow for adding 3rd party repositories.

What that means is if I’m a developer or maintainer for a Linux program and I can’t get my package into Redhat or Ubuntu’s package repository, I can simply create my own repository. Users can be given instructions for adding this repository along with its digital signing key. Even if a project has its packages in official repositories, the project owners may decide to maintain their own repository as well for users who want more up to date versions of their software or bleeding edge beta releases.

Google Play and Microsoft’s app store are similar to Apple’s. They use a single repository, controlled by their respective companies, to distribute packages that go through an approval process. Although Linux distributions may have an approval process for official packages, it’s typically handled using issue trackers and is considerably more open than Apple or Google’s.

The iOS/Android Base

Anroid packages (apk files) are (mostly) monolithic. Although some packages exist on the system partition (packages distributed with the phone by the manufacturer that cannot be uninstalled without root access), most exist on the user’s data partition. However the system partition itself is not managed by any packages.

When an Android update is available, it is downloaded to the data partition and then the phone reboots using the recovery partition. The recovery partition contains a minimal operating system that finds the update, verifies its signature, mounts the system partition as writable and applies that update. This can be a full update for the entire system, but typically manufactures release incremental updates that only contain the files that need to be updated and pre/post install scripts. If you have a rooted device and have changed things on the system partition, these updates may fail or undo customizations such as rooting the device.

The entire system partition can be thought of as a single monolithic package. The update and reboot process is similar to the way Windows updates work (although Windows can write to C:\Windows, it usually defers writes until after a reboot so avoid conflicts with locked files). Unlike Windows, the base of every Android device may be very different. Google simply can’t release updates every second Tuesday like Microsoft. Each manufacture must rebase their work on the releases made by Google and then send out updates for their devices. This allows security exploits to be left out on some devices for weeks or months, if they’re even patched at all.

Although the update process is similar for iOS products, because Apple has such tight control between their hardware and software, they control the complete release system for all their devices. There are no customizations for individual manufactures. There are only a handful of devices they need to support and they are all manufactured by Apple. The trade-off for potentially faster and more comprehensive security updates is lack of device choice and customization.

A General Purpose Base

Last year Microsoft decided to bundle Candy Crush with their Windows 10 operating system¹, the first of a plethora of promoted apps², lock screen ads³ and other bloatware onto their base operating system. Despite the many complaints over the past two decades about Microsoft, at least their base install of Windows use to be fairly bloatware free.

A base Android install can also be bloatware free, but these stock versions are only officially made for Google’s line of Nexus devices as well as some limited Google Play versions of phones. There are many 3rd party mods for Android devices such as CyanogenMod or MIUI which, although they do reduce the overall bloat of most carrier’s Android installs, must be individually ported to each Android device.

So what prevents Android from being more like Windows? Why can’t a user simply install ASOP onto his or her device? One of the major issues in porting involves drivers, and this is another problem where standard packages could come to the rescue. In the Linux world, there are things know as source packages, such as debs on Debain based systems or an SRPM on rpm based systems. Google could create a custom yet standardized format for packaging drivers, as well as a tool chain that could be part of the Android SDK for building those drivers for a specific device and kernel.

Hardware manufactures don’t even have to include the actual source code for their drivers. Nvidia⁴ and ATI both have proprietary video drivers for Linux that contain a layer that links Linux kernel functions with their closed binary blobs. The linux-firmware project is another example where manufactures provide binary blobs of closed source firmware that can be loaded by the Linux operating system as needed for specific devices.

I realize I’m oversimplifying things a bit and there are many other issues involving porting custom Android roms to different phones and devices than just the kernel and drivers. Still Google does have complete control over many of these manufacturers thanks to the Open Handset Alliance (OHA), an exclusive agreement manufactures must sign if they wish to distribute Google Apps with their devices. Google does have the ability to create standards by which an Android base, similar to a base Windows install, is achievable.

Google briefly ventured into creating bloatware free versions of devices by licensing Google Play versions of its phones. These were premium devices by major manufactures that didn’t include branding or carrier customizations, essentially making them similar to stock Nexus devices⁵. Although a step in the right direction, Google killed off this line of phones in 2015⁶.

Conclusions

The current state of Android OS is unmaintainable. Imagine if, for ever Windows update, your device manufacture (e.g Dell, HP, Lenovo) had to integrate that update into their Windows code base and then release a patch for your specific Dell or HP version of Windows. The reality is that many power users wipe devices they purchase to purge their machines of manufacturer installed bloatware.

Google’s preview for Android N does support background security updates⁷, which addresses one of the major issues caused by Android fragmentation. However it still doesn’t address all the existing devices whose manufactures will never bother with an Android N update, and it still doesn’t provide a standard, unified base platform for Android.

Android started out on the G1 (a.k.a HTC Dream), a phone with just 256MB of internal storage. An embedded design approach may have made sense at the time, allowing for a secure device with a minimal footprint. Android devices have become considerably more powerful, some even rivaling consumer laptops, and the embedded model of Android has become one of its biggest limitations. To truly solve the fragmentation issue, Google needs to create a set of standards by which Android is treated as a more general purpose operating system. A unified base system, hardware standards and driver packaging can lead the way to a more streamlined, bloat free and upgradeable device ecosystem with less fragmentation.

Windows 10 will automatically install Candy Crush Saga, bundleware comes to Redmond . 14 May 2015. Sams. Neowin. ↩
Microsoft is adding more ads to the Windows 10 Start menu . 16 May 2016. Warren. The Verge. ↩
Windows 10 Is Showing Ads On Your Lockscreen, Here’s How to Turn Them Off. 24 Feb 2016. Ravenscraft. LifeHacker. ↩
Aalto Talk with Linus Torvalds (clip). 17 Jun 2012. Video. YouTube. ↩
Pure Android: Samsung Galaxy S4 and HTC One ‘Google Play editions’ review. 26 June 2013. Bohn. The Verge. ↩
Google kills off the last remaining Google Play Edition device in the Play Store. 21 Jan 2015. Chavez. Phandroid. ↩
Google launches Android N Developer Preview 3 with seamless updates and VR mode. 18 May 2016. Protalinski. Venture Beat. ↩

There’s nothing quite like not being able to get something to work the way it should, and implementing a terrible hack instead. It may work for now, but you can only kick that can so far down the road. Recently a coworker discovered one of my terrible hacks, and after months of kicking the can, I finally had to figure it out. The answer involved a long journey, ending in changing a series of hyphens (-) to the in keyword. It wasn’t a bug, just an oddity of the way that the ScalaTest framework works.

The following ScalaTest code passes:

class BeforeAndAfterWorks extends FreeSpec with Matchers with BeforeAndAfterAll {
  var example = false

  override def beforeAll() = {
    example = true
  }

  override def afterAll() = {
    example = false
  }

  "Some Test Set" - {
    "should pass" in {
      example shouldBe true
    }
  }
}

However the following fails on the matcher example shouldBe true:

class BeforeAndAfterDoesNotWork extends FreeSpec with Matchers with BeforeAndAfterAll {
  var example = false

  override def beforeAll() = {
    example = true
  }

  override def afterAll() = {
    example = false
  }

  "Some Test Set" - {
    "should pass"  - {
      example shouldBe true
    }
  }
}

In both cases, I use the BeforeAndAfterAll mixin to correct set the value of the example variable before the test begins. If you’re trying to play spot the differences, you should pay more attention because the answer was in the opening paragraph. It all has to do with replacing the - with in.

Looking through the ScalaTest source code, we find that - is really just a wrapper for registerNestedBranch, while in is a wrapper for registerTestToRun. The hyphen is meant to hold collections of sub-tests in a tree like structure, while in actually registers the following block as a unit test. It’s possible to have nested branches, but you cannot nest in blocks. These in blocks also uniquely identify tests by their string, preventing tests with duplicate names, as shown in the following error output:

[error] Could not run test org.penguindreams.DuplicateTests: org.scalatest.exceptions.DuplicateTestNameException: Duplicate test name: Some Test Set should pass
...
[info] - top in clause *** FAILED ***
[info]   An in clause may not appear inside another in clause. (NestedIn.scala:19)
 (BeforeAndAfterWorks.scala:21)

The issue comes because ScalaTest does allow for matchers/assertions to occur within the tree, but outside of an in block. Code run in this space is not subject to BeforeAndAfterAll or BeforeAndAfterEach traits.

This issue took a while to debug. After I figured it out, and unleashed a storm of rage and profanity, I created a merge request for my co-workers to review. One of the comments added to it, “Too funny - well not really. I ran into this same issue once.”

I haven’t delved into the architecture of ScalaTest yet, but I’m guessing that mitigating this particular situation may not be possible with the ScalaTest architecture. In any case, hopefully this post will help other developers who get stuck in the same situation; attempting to figure out why their pretest requisite functions do not run correctly in their test specifications. The code used in these examples can be found at https://gitlab.com/djsumdog/freespec-beforeandafter-notrunning-example.

Open Source Initiative (OSI) Logo

The idea behind open source software is a simple one. Developers decide to make the source code for their software available for free, for everyone to use, modify and redistribute. However, not all open source licenses force redistribution. Many projects today symbolically adopt a banner of open source while their primary motivation is product monetization over building community. Some go as far as to making their products difficult to use without paid support or even removing critical features and placing them in an enterprise version. We’re going to take a look at commercial/open business models implemented by companies like Alfresco, TypeSafe, Apple, Google and others. We’ll examine how they fit in with various open source philosophies of the past and where we are likely to go in the future.

Enterprise Edition: The Return of Demo-ware

Recently Lightbend (formerly Typesafe) decided to remove support for Microsoft SQL from Slick, their open source database abstraction library, and place it into their commercial closed-source version¹. This has lead to the fork known as FreeSlick² which tries to build upon the removed drives and further maintain them with test cases. Lightbend’s commercial nature and focus on keeping Scala running on the Java Virtual Machine has led to forks of the Scala toolchain in the past³.

Alfresco Community Edition, an open source document management system, once had the components within it to support clustering. Although it was never officially supported in the Community Edition, it was available and there were many guides showing how to enable it. Starting with 4.2a, all of this support was completely removed and placed entirely in the subscription based Enterprise Edition⁴. At the time, I was working for an open source company that was considering back-porting cluster support in the form of an open source plugin.

Companies like TypeSafe, Alfresco and Magento (an e-commerce system) operate on an open source model that provides both a community and an enterprise version. They often benefit from contributions to their community editions in the forms of extensions and plugins made by their user base, while still maintaining a proprietary commercial version with more extensive feature sets.

Apple and Google: A Clash of Licenses

Let’s look at two of the largest players, Apple and Google. Apple’s OS X operating system made huge waves on its release in 2002. It was a complete architectural departure from OS 9 and it had an open source base: Darwin BSD. People mistakenly equivocated Mac with being Linux based or Linux powered. In reality, it had a very loose UNIX base on which they added a lot of proprietary non-UNIX subsystems. Graphics are not rendered on an X Server, but instead on their proprietary Aqua subsystem (although an X11 server could be run on top of Aqua to run traditional *NIX application). Although attempts were made at various points to add a more modern file system to OS X such as ZFS, to date Apple still uses the ancient HFS+ file system; a file system blogger Jody Ribton has argued is so old and proprietary that’s it’s dangerous to use for serious data integrity applications⁵.

The use of the Berkley Standard Distribution (BSD) License is also a significant choice for Apple. The BSD license is much more permissive and allows for commercial reuse without requiring release of the modified source code. The Free Software Foundation’s (FSF) alternative to the BSD license was the GNU General Public License (GPL). The GPL, like the BSD, allowed code to be modified and used for commercial ventures, but it required derivatives of the original code to be released back to the community. By using software that is licensed as BSD or one of the BSD derivatives, Apple doesn’t have to release the source of any software it modifies back to the original author or community.

Over the course of the last several releases of OS X, Apple has slowly been removing GPL software. Where Mac OS 10.5 contained 47 GPL licensed packages, 10.10 contains only 18! Even within those 18 packages, the shipped version of bash is 3.2 from 2006. The current version is 4.2.10, but 3.2 was the last version shipped under GPL version 2. GPL version 3 has two major clauses that would hurt Apple. The first prohibits patent lawsuits for people who use the GPLv3 software you produce and ship. The second prevents locking down hardware to prevent running custom software⁶.

On the flip side, Google’s Android platform is actually running a Linux kernel as its base. Like Mac OS X, Android does a lot of things differently from the standard Linux model and uses its own subsystems for startup, hardware abstraction, graphical display and device management. Google didn’t actually create Android. Like many of their other products, they purchased the company that created it and incorporated it into their ecosystem. Google has maintained an open source version of Android, known as the Android Open Source Project (AOSP). All handset providers use AOSP as they base, modify it to their needs and add their own proprietary, often not open source, applications.

When signing an agreement with Google, manufactures can distribute GApps with their devices, the basic set of Google applications and services including the Google Play Store. (Amazon, doesn’t provide Google apps on their Android devices, instead utilizing their own app store and services). Over the course of the past few years, Google has stopped updating most of the AOSP/open source versions of their applications. The search, calendar, music and camera apps are just a couple of examples where the AOSP versions were practically frozen with new features only being placed in the proprietary Google versions of each respective app. Even though the Google apps are free, they’re closed source and are dependent on installing the proprietary Google Services and Play Store⁷.

As previously mentioned, Amazon has its own app store for its line of Kindle Fire devices. App writers that need to depend on either Google or Amazon services often need to build multiple versions of their apps for the various stores. Furthermore, companies that are allowed to distribute Google Apps enter into the Open Handset Alliance (OHA), a strict license agreement with Google banning those companies from producing anything Google considers non-compatible. This means that OHA companies can never produce an Amazon-only device, nor can they add support for alternatives to Google services as Motorola and Samsung discovered when they attempted to use Skyhook for location services over Google. Skyhook sued Google in 2010 and, as of 2014, has yet to go to trial⁸.

Google apps and the OHA come with locked in, anti-competitive stipulations that keeps a useful Android OS within Google’s grips. It takes a large players, like Amazon, to even attempt to offer an alternative based on the same underlying open source base. Samsung often releases their own versions of standard apps, along with their own app store, alongside the Google apps. It’s what some believe is a half-and-half strategy to build up their own app market so that they may one day be able to break away from the Google ecosystem and OHA if necessary.

This type of situation involving lock-ins to proprietary commercial software was one of the principal driving forces for creating licenses like the GNU GPL that require derivative works to have an equal license. However even with with such protections, Google has found a way to build a closed-source proprietary structure on top of GPL software, using agreements over the means of app distribution to build a closed software ecosystem. Apple takes the alternative route of avoiding anything with a license that could limit the proprietary nature of its software and its intellectual property rights. It has gone so far as to rewrite major components itself to distance itself from GNU GPLv3 code.

Microsoft

In the early 2000s, Microsoft was the big evil enemy. They were facing anti-trust litigation from countries all over the world. They locked in their OEMs from selling dual-boot systems, similar to the situation with Google’s OHA today. As a result, Be Inc was prevented from being able to sell dual-boot BeOS/Windows systems⁹. The one thing they couldn’t compete against was Linux. Many Linux distributions were non-commercial and open source, leaving no one to directly compete against with or sue. Many developers and people in tech took up the Linux torch in almost a religious devotion as the alternative to the Windows desktop monopoly. So Microsoft took up arms against companies who made commercial offerings using Linux.

In 2003, SCO sued several Linux companies for violating their intellectual property rights calming that part of the Linux kernel used SCO UNIX source code. It was discovered that SCO was getting quite a bit of funding from Microsoft, leading some to believe that Microsoft was pushing the SCO lawsuits against Linux as Microsoft itself was also suing various open source companies¹⁰.

Open Source Stickers on Microsoft Keyboards - Art Exhibit - London

Fast forward a few years later and Microsoft is no longer on top of the world. Their share of the web browser market was being lost to Firefox, long before Google brought the Chrome browser onto the scene. Even though Microsoft pioneered the ability to view satellite images of the entire planet with Terraserver, Google actually made this concept useful to individuals with their Google Maps software and the purchase of Google Earth¹¹. With the lack luster uptakes of Windows Vista and Windows 8, Microsoft’s share of the consumer PC market has lost ground to Apple and Linux. Their mobile offering has such a low uptake that Mint, American Airlines, Pinterest, Bank of America, NBC and many other high profile web presences have either stopped updating or completely removed their Windows Mobile apps¹².

Microsoft has recently done a compile turnaround when it comes to open source. In 2014, they open sourced their entire .NET runtime. This followed with open sourcing the Roslyn .NET compiler and Visual Studio¹³. Partnerships with both Red Hat and Cyanogen are further evidence that Microsoft is making drastic changes to try to maintain footing in today’s market. It’s a far cry from an earlier time when Bill Gates wrote an open letter to the Hobbyists in Homebrew Club 1976 accusing them of software theft for making copies of Microsoft’s BASIC programming language¹⁴.

Where Microsoft once tried to stifle the growth of open source alternatives, now they must embrace it to stay reverent in a market where they are losing shares rapidly to competitors. Keep in mind that their largest competitors are moving the other direction entirely. As talked about previously, Google and Apple may have based their products on open source software originally, but now they are slowly moving to keep the proprietary core functionality of central infrastructures and software ecosystems in closed, contractually managed services.

The Hands of Commercial and Open Source

There is a considerable amount of ideology around open source, and the idea that code should be free. In the late 90s and early 2000s, there was this great feeling that open source was taking on Microsoft and a Windows dominated landscape. However even during this time, many open source projects had significant commercial funding.

In 2001, IBM’s dedication to Linux resulted in an attempt at a viral Peace, Love, Linux chalk art campaign. The marketing gimmick resulted in a $100,000 graffiti fine from the city of San Fransisco when they discovered the chalk didn’t wash away for months¹⁵. The Linux kernel gets major contributions from commercial entities such as Red Hat, IBM, Intel and many others, allowing them to have people who work and are paid full time on Kernel development¹⁶. In 2008, the Mozilla Foundation earned over 88% of its revenues from Google, mostly via search related royalties as Google was the default search engine at the time¹⁷.

I don’t mean to downplay the community around open source. There is still plenty of academic and volunteer development going on; that is software written by developers in their spare time or at universities. Although monetizing projects has helped many of them grow or stay alive, the trend to having both open source and commercial interests has led to contention in many communities.

“Community made open source software needs people to be able to take out what they’ve put in. Ubuntu’s licenses and policies enforce this. However for the last three years Ubuntu’s main sponsor Canonical has had a policy contrary to this and after much effort to try to rectify this it’s clear that isn’t going to happen. The Ubuntu leadership seems compliant with this so I find myself unable to continue helping a project that won’t obey its own community rules and I need to move on.” -Jonathan Riddell, former release manager for Kubuntu¹⁸

When Gitlab responded to their community with improvement to their software, certain features were only present in their enterprise edition. The code for their enterprise edition was visible, but not open (it requires a license to run). In particular, developers for the Video Lan Client (VLC) media player refused to take advantage of the enterprise edition, not wanting to use any proprietary/non-free software¹⁹.

A number of companies, such as Atlassian and JetBrains provide free licenses to open source developers for their commercial products. Much of their software is based on open source libraries and tools, so it can be seen as giving back to the community. However, the free licenses they offer are still for closed software that are build upon open source technologies.

In 2015, Drew DeVault, a developer and technology blogger, encouraged developers to stop using Slack for open source projects due to its closed nature²⁰. Even with its limitations, the Internet Relay Chat (IRC) protocol is open and well established and helps facilitate open communication between developers and their communities. There are also several newer open protocols such as IRCv3 and XMPP which attempt to bridge the gap in chat systems using open standards as well.

The conflicts between free and non-free software can get pretty contentions. Canonical maintains Ubuntu while marking proprietary products like Landscape. Linux and its founder, Torvalds, are steadily in the camp on the other side of Stallman and the FSF, keeping the Linux Kernel in the GPLv2 license with no intent to upgrade to GPLv3²¹. Stallman has openly criticized Clang/LLVM for not truly understanding the open source landscape and creating software that advances proprietary software²², while authors of Clang claim they cannot build useful tools with GCC²³. The rabbit hole continues as far as you want to go.

“Mark has repeatedly asserted that attempts to raise this issue are mere [Fear Uncertainty, Doubt], but he won’t answer you if you ask him direct questions about this policy and will insist that it’s necessary to protect Ubuntu’s brand. The reality is that if Debian had had an identical policy in 2004, Ubuntu wouldn’t exist. The effort required to strip all Debian trademarks from the source packages would have been immense, and this would have had to be repeated for every release. While this policy is in place, nobody’s going to be able to take Ubuntu and build something better. It’s grotesquely hypocritical, especially when the Ubuntu website still talks about their belief that people should be able to distribute modifications without licensing fees.” -Matthew Garrett²⁴

The Philosophy of Open Source

The GNU General Public License grew out of a philosophy that software should be free, and derivatives of that software should be free as well. It was a contrast to the industry standard of commercial software, shareware, demoware and crippleware at the time. But today there are many products that pay lip services to the open source concepts; having both open and enterprise versions very similar to the demoware/shareware concepts of the past. Some use GNU GPL software as a base for an entirely commercial ecosystem. The original philosophies behind the Free Software Foundation have been commandeered by those who seek to profit from open source while returning only marginal or symbolic contributions to the community as they enter patent lawsuit wars amongst each others.

There have been significant contributions to underlying technology stacks used to build many of the compiling, web, testing and data storage frameworks by commercial entities. It’s important to note that many of these contributions have been released on permissive licenses that are more accessible for commercial use. Rather than embrace the GNU GPL, today most libraries and programming languages are released under licenses such as MIT, Apache, BSD and others. Even as I was writing this article, Android’s Native Development Kit (NDK) just changed their default compiler from the GNU C Compiler (GCC) to Clang, noting in their changelog that GCC is now deprecated²⁵. Clang was contributed to by Apple, Microsoft, Google, ARM, Sony and Intel. It is, of course, under a permissive license known as the University of Illinois NSCA License.

What we’re essentially seeing more of in the industry today, is open sourcing central technologies and reusable components without open sourcing the software it produces. This can help startups and small developers to create large, scalable products. However, those products are often locked into the ecosystems of the larger players. It’s beneficial for Facebook, Twitter and Amazon to help others create applications that feed people into their user base, because it’s more likely those products would depend on connections to the big networks rather than compete with them.

Even thought there is considerably more open source software deployed in the wild today compared to the previous decade, making its way into everything from server clusters to consumer electronics, it is a far departure from the world originally envisioned by some of the most vocal open source advocates and anti-commercial zealots of the late 1990s. The original idea behind the Free Software Foundation’s concept of open source was an ecosystem where every new development resulted in more open and free code. We’re talking about people who truly believed in getting away from commercial software entirely. There were people who felt that one day, Linux desktops could replace Windows and even high end tools like Photoshop and Final Cut would have great open source replacements.

This never happened. Although Linux, FreeBSD and other open source operating systems are great for developers, we never truly had the year of the Linux desktop. No matter how many people try to claim that Gimp is just as good as Photoshop, that any lack of feature is simply not being familiar with Gimp’s interface, is either in a state of denial or they haven’t had to truly do any intense graphics related work other than cropping images or adjusting levels. Today, Illustrator is still far easier to learn than Inkscape, and LibreOffice Writer, while an excellent program, still lacks many of the advanced features of Microsoft Word. Most video editing applications for Linux suffer continual crashes, leaving Blender, a 3D modeling tool not originally designed for video editing, to be the only stable video editing platform. Although Steam and Humble Bundle have brought a plethora of independent and main stream games to Linux, those games are closed source, commercial and, in the case of Steam, under Digital Rights Management (DRM).

Rather that the computer utopia of all software being free and hardware being what people pay for, much of the software we use today is being moved out of the realm of pay-once desktop software, and into subscription based offerings. Where one could once simply keep using an older version of a piece of software and not pay for an upgrade if it wasn’t necessary, now people must pay continual subscriptions for software that can never be owned, only rented, for their entire lives.

Although companies like IntelliJ provide fall-back licenses to allow people to use previous version if they stop subscribing²⁶, other companies like Adobe will simply cut off access to the current product, even if you’ve been a subscriber for years²⁷. Furthermore, a lot of software is moving entirely to the web. These web applications are often based on open source components while creating closed, wall-gardened systems. People who then develop applications based against those web services, say using the Dropbox or Facebook public APIs, are now locked in to those systems.

Open source software is alive and well, backing most of the systems we take for granted every day. Communities like Github have paved the way for more open collaboration and increased contributions. More software today is branded with the marketing gimmick of being moved “into the cloud”, and into subscription models were people perpetually rent software rather than purchase it. Many of the websites we use are walled gardens of free services that are not open, and which make it intentionally difficult to move your data should you become unsatisfied with the service provider. Much of the opens source software being released today is backend technology or developer tools. We are still a far cry away from having the day to day software we use being truly free, not only in cost, but being able to modify it to our needs and run it anywhere we want.

bring back SQLServerDriver. 23 Jan 2015. fommil. slick/github. ↩
FreeSlick ↩
Typelevel Scala and the future of the Scala ecosystem. 2 September 2014. Sabin. ↩
What’s going on with Alfresco clustering? 17 October 2012. Potts. ECM Architect. ↩
HFS is Crazy. 4 December 2015. Ribton. Liminality. ↩
Apple’s Great GPL Purge. 23 Feb 2014. Matthew. ath0. ↩
Google’s iron grip on Android: Controlling open source by any means necessary. 21 Oct 2013. Amadeo. Arstechnica. ↩
Skyhook v. Google patent trial slips into 2014 as result of consolidation of two lawsuits. 29 March 2013. Foss patents. ↩
BeOS will live on as Microsoft settles legal action. 12 September 2003. Thibodeau. Computer Weekly. ↩
A Look at the Microsoft-funded SCO Lawsuit in Light of Newer Anti-Linux Microsoft Lawsuits. 7 August 2009. Schestowitz. Techrights. ↩
Microsoft Invented Google Earth in the 90s Then Totally Blew It 13 November 2015. Koebler. Motherboard/Vice. ↩
Windows Phone has a new app problem. 23 October 2015. Warren. The Verge. ↩
Opening up Visual Studio and .NET to Every Developer, Any Application: .NET Server Core open source and cross platform, Visual Studio Community 2013 and preview of Visual Studio 2015 and .NET 2015. 12 November 2014. Somasegar. Microsoft. ↩
Bill Gate’s 1976 Letter About Software Piracy . 27 December 2008. Barker. Gadgetopia. ↩
IBM gets $100,000 fine for ‘Peace, Love and Linux’ campaign. 28 November 2001. Lemos. ZDNet. ↩
Who funds Linux development?. 17 April 2009. TheNumerator. GCN. ↩
Google Makes Up 88 Percent Of Mozilla’s Revenues, Threatens Its Non-Profit Status. 19 Nov 2008. Schonfeld. TechCrunch. ↩
Jonathan Riddell Stands Down as Release Manager of Kubuntu. 23 October 2015. Kubuntu. ↩
Dear open-source maintainers, a letter from GitLab (HackerNews comments). 18 January 2016. (Retrieved 15 August 2016). HackerNews. ↩
Please don’t use Slack for FOSS projects . 1 November 2015. DeVault. ↩
Torvalds: No GPL 3 for Linux. 30 January 2006. Shankland. Cnet. ↩
Re: clang vs free software. 24 January 2014. Stallman. GCC Mailing List. ↩
GoingNative 2012 Clang Defending C++ from Murphy’s Million Monkeys. 2012. (Video) ↩
If it’s not practical to redistribute free software, it’s not free software in practice. 19 November 2015. Garrett. ↩
Changelog for NDK Build 2490520. Retrieved 23 December 2015. ↩
What is perpetual fallback license?. 28 December 2015. JetBrains. ↩
What happens to my work when I cancel my subscription?. 7 May 2013. Adobe Communities. (Fourm Thread) ↩

Windows 10

Most of the time I spend on my computer is in the Linux world, however I do have a Windows laptop for the non-open applications I need to use from time to time. One of those applications is the video conferencing tool I use for work. Last Wednesday I was working from home, switched to my Windows laptop to prepare for the morning scrum conference, only to find Windows had decided to update and restart itself. Annoying, but not a big deal, until I logged in and realized that all my drivers for networking, bluetooh, usb audio and usb video were all disabled.

I was able to download Wi-Fi and Bluetooth drivers to a USB stick and reinstall them on my Windows laptop. However, USB audio and video devices use drivers that are built directly into the Windows operating system. Whenever I attempted to update these driver in the device manager, I’d get the following error:

Error message when attempting to update drivers

Through a series of searches, I eventually discovered a log file named C:\Windows\INF\setupapi.dev.log:

>>>  [Device Install (DiShowUpdateDevice) - USB\VID_262A&PID_1100&MI_01\6&213CF812&1&0001]
>>>  Section start 2016/08/17 14:19:36.912
      cmd: "C:\WINDOWS\system32\mmc.exe" "C:\WINDOWS\system32\compmgmt.msc" /s
     dvi: {DIF_UPDATEDRIVER_UI} 14:35:31.130
     dvi:      Default installer: Enter 14:35:31.133
     dvi:      Default installer: Exit
     dvi: {DIF_UPDATEDRIVER_UI - exit(0xe000020e)} 14:35:31.137
     ndv: {Update Driver Software Wizard for USB\VID_262A&PID_1100&MI_01\6&213CF812&1&0001}
     sto:      {Setup Import Driver Package: c:\windows\inf\wdma_usb.inf} 14:35:35.293
!    sto:           Unable to determine presence of driver package. Error = 0x00000002
     inf:           Provider: Microsoft
     inf:           Class GUID: {4d36e96c-e325-11ce-bfc1-08002be10318}
     inf:           Driver Version: 10/29/2015,10.0.10586.0
     sto:           {Copy Driver Package: c:\windows\inf\wdma_usb.inf} 14:35:35.308
     sto:                Driver Package = c:\windows\inf\wdma_usb.inf
     sto:                Flags          = 0x00000007
     sto:                Destination    = C:\Users\myusername\AppData\Local\Temp\{360925e0-0e26-7643-bd01-ff3a8203caf7}
     sto:                Copying driver package files to 'C:\Users\myusername\AppData\Local\Temp\{360925e0-0e26-7643-bd01-ff3a8203caf7}'.
     flq:                Copying 'c:\windows\inf\wdma_usb.inf' to 'C:\Users\myusername\AppData\Local\Temp\{360925e0-0e26-7643-bd01-ff3a8203caf7}\wdma_usb.inf'.
!!!  flq:                Error installing file (0x00000002)
!!!  flq:                Error 2: The system cannot find the file specified.
!    flq:                     SourceFile   - 'c:\windows\inf\USBAUDIO.sys'
!    flq:                     TargetFile   - 'C:\Users\myusername\AppData\Local\Temp\{360925e0-0e26-7643-bd01-ff3a8203caf7}\USBAUDIO.sys'
!!!  cpy:                Failed to copy file 'c:\windows\inf\USBAUDIO.sys' to 'C:\Users\myusername\AppData\Local\Temp\{360925e0-0e26-7643-bd01-ff3a8203caf7}\USBAUDIO.sys'. Error = 0x00000002
!!!  flq:                SPFQNOTIFY_COPYERROR: returned SPFQOPERATION_ABORT.
!!!  flq:                Error 995: The I/O operation has been aborted because of either a thread exit or an application request.
!!!  flq:                FileQueueCommit aborting!
!!!  flq:                Error 995: The I/O operation has been aborted because of either a thread exit or an application request.
!!!  sto:                Failed to copy driver package to 'C:\Users\myusername\AppData\Local\Temp\{360925e0-0e26-7643-bd01-ff3a8203caf7}'. Error = 0x00000002
     sto:           {Copy Driver Package: exit(0x00000002)} 14:35:35.362
     sto:      {Setup Import Driver Package - exit (0x00000002)} 14:35:35.365
!!!  ndv:      Driver package import failed for device.
!!!  ndv:      Error 2: The system cannot find the file specified.
     ndv:      Installing NULL driver.
     dvi:      {Plug and Play Service: Device Install for USB\VID_262A&PID_1100&MI_01\6&213CF812&1&0001}
!    ndv:           Installing NULL driver!
     dvi:           {DIF_ALLOW_INSTALL} 14:35:35.****
     dvi:                Default installer: Enter 14:35:35.421
     dvi:                Default installer: Exit
     dvi:           {DIF_ALLOW_INSTALL - exit(0xe000020e)} 14:35:35.421
     dvi:           {DIF_REGISTER_COINSTALLERS} 14:35:35.422
     dvi:                Default installer: Enter 14:35:35.422
     dvi:                Default installer: Exit
     dvi:           {DIF_REGISTER_COINSTALLERS - exit(0x00000000)} 14:35:35.423
     dvi:           {DIF_INSTALLDEVICE} 14:35:35.423
     dvi:                Default installer: Enter 14:35:35.424
!    dvi:                     Installing NULL driver!
     dvi:                     Install Null Driver: Removing device sub-tree. 14:35:35.425
     dvi:                     Install Null Driver: Removing device sub-tree completed. 14:35:35.429
     dvi:                     Install Null Driver: Restarting device. 14:35:35.431
     dvi:                     Install Null Driver: Restarting device completed. 14:35:35.435
     dvi:                     Install Device: Starting device. 14:35:35.435
     dvi:                     Install Device: Starting device completed. 14:35:35.448
     dvi:                Default installer: Exit
     dvi:           {DIF_INSTALLDEVICE - exit(0x00000000)} 14:35:35.448
     ump:      {Plug and Play Service: Device Install exit(00000000)}
     ndv: {Update Driver Software Wizard exit(00000002)}
<<<  Section end 2016/08/17 14:35:37.358
<<<  [Exit status: FAILURE(0x00000002)]

So it seemed that the latest patch had somehow removed USBAUDIO.sys, or caused the INF file to search for it in the wrong location (it’s actually located in C:\Windows\System32\Drivers). However copying the file to the location mentioned in the logs would give me an error about the drivers not being signed. I’d get the same error from the Microsoft WiFi Miniport Adapter.

Unsigned driver error when attempting to update Wi-Fi screenshot

I assume the latest update broke my system’s USB support. Running the following command generates a listing of recently installed Windows updates¹:

wmic qfe list brief /format:htable > "%USERPROFILE%\hotfix.html"

I attempted to uninstall the most recent update. I still couldn’t install USB audio/video drivers. I then attempted to roll back to a system restore checkpoint. Some of my applications that depending on the C++ runtimes installed at that checkpoint stopped working and had to be reinstalled, but the USB issue still persisted.

Any searches I preformed for this issue gave me a pretty big spread on results with multiple issues dating back to Windows 7 and XP. A post to SuperUser² on the issue went unanswered and another post on Microsoft answers was given a worthless cookie-cutter response³. The only real support I got was from a supposed Microsoft employee (johnwinkmsft) on Reddit who at least attempted to seriously look at my issue⁴.

Eventually I started preforming searches relating directly to the Anniversary update. I found a post somewhere that suggested running Microsoft’s Windows10Upgrade28084.exe, which is a manual install of the anniversary update. Even though I had checked for updates multiple times with the built-in Windows Update tool, when I ran the anniversary update installer, it said my build was behind. I allowed it to install the update and rebooted. Afterwards, I went through the annoying first-time boot screen in which I had to be sure and disable, once again, all the ways Microsoft seeks to monitor my personal usage information.

Custom Windows 10 Privacy Settings Screenshot 1

Custom Windows 10 Privacy Settings Screenshot 2

When I finally got back to the desktop, all my USB devices were once again working. The summary KB3176493 mentions security fixes for kernel mode drivers⁵. I suspect that KB3176493 may have been dependent on the Anniversary update/build of Windows 10, and that the signing key, or the location for key drivers changed between the two builds. Still, that doesn’t explain why the issue didn’t resolve by uninstalling the update or dropping back to a system restore point.

I’m not a Windows system expert by any means. Had this been an issue on my Linux system, I’d have the domain knowledge to diagnose this issue a lot faster. I also believe I would have had been response from the developer community, such as when I made several posts on various mailing lists and bug trackers in an attempt to get Wi-Fi and bluetooth working on my laptop. I can understand why Windows support is much more difficult with a much larger installation base of mostly non-tech users. In the past, I cannot recall any security updates for Windows that have broken my system, or at least not as severely as this case. It’s troublesome the number of issues users have been reporting with the Windows 10 Anniversary Update, as well as the KB3176493 patch causing printing bugs⁶. Still I’m glad I continued to diagnose this issue and not resort to the 1990s solution of reinstalling Windows.

Quick and Easy Way to List All the Windows Updates Installed on Your System. 15 November 2011. Laurie. Gizmo’s Freeware. ↩
USB Audio and Video devices no longer work after Windows 10 Update KB3176493. 17 August 2016. djsumdog. SuperUser. ↩
After Windows 10 update/restart, Networking, Wi-Fi, USB Audio/Video drivers are all gone. 17 August 2016. ↩
https://www.reddit.com/r/Windows10/comments/4wxcf3/windows_10_cumulative_updates_kb3176493_kb3176495/d6lb8i9. 16 August 2016. Reddit. ↩
Cumulative update for Windows 10 Version 1511. 9 August 2016. Microsoft. ↩
Microsoft admits to distributing Windows printing bugs in KB 3177725 and KB 3176493. 15 August 2016. Leonhard. InfoWorld. ↩

Feel so Good - The Spelling Mistakes (1980 - Album Art)

I am a terrible speller. Every few words I find myself hitting the menu key to correct some word staring at me with its squiggly red line. This proved to be horribly difficult back when I used MacOS, which lacks a menu key and requires the user to find the spell correcting shortcut for each individual application (if one even exists). In the Linux world, I’ll often open a terminal and run aspell -a when the traditional spell check fails me. Aspell is remarkably better at correcting my poor spelling, so why then do most Linux application use the terrible checking provided by Hunspell? Both Aspell and Hunspell are replacements for the much older International Spell or Ispell. When running either using their interactive command line tool, they even proclaim to be Ispell (but really Aspell/Hunsepll) in their output. To illustrate my issues with Hunspell, let’s take some examples of my terrible spelling mistakes.

Take for example the misspelled work excersized.

@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6.1)
excersized
& excersized 8 0: exercised, excised, excesses, exercises, exorcised, exercise's, exercisers, exerciser's

@(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.3)
excersized
& excersized 1 0: supersized

Aspell provides the correct spelling for my butchered attempt as the first option. Hunspell failed to provide the correct spelling at all.

Another example is the misspelled word enterperuner:

@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6.1)
enterperuner
& enterperuner 5 0: entrepreneur, Enterprise, enterprise, interpreter, interferon

@(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.3)
enterperuner
& enterperuner 2 0: interpenetrate, serpentine

Once again, Aspell shines as proving the correct word as the very first open. Hunspell fails to provide the correct word at all.

We see the same thing with presuing:

@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.6.1)
presuing
& presuing 11 0: pressing, perusing, presuming, presiding, pressuring, praising, reusing, pursuing, presaging, pressings, pressing's

@(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.3)
presuing
& presuing 6 0: presuming, pressing, pressuring, presupposing, pressurizing, presiding

On the official Aspell website, there are test results showing how Aspell 0.60.6 out performed Hunspell 1.1.12¹. Although the test data only contained English words, one of Aspell’s big improvements over the Ispell was UTF-8 encoding throughout its code base. This allowed Aspell to suggest corrections for words with international characters such as umlauts in German². However, Hunspell supports UTF-8 natively as well.

So why is Hunspell used everywhere, even though Aspell seems to have superior correction capabilities? One of the issues is that Aspell is no longer maintained. It’s last release was in 2011³, where as Hunspell is actively maintained and used in LibreOffice, Firefox, Thunderbird and Chrome.

So why not make the spelling framework pluggable? After all, many Linux distributions support changing out backend services, such as with Gentoo’s eselect tool or Ubuntu’s etc-alternative system. The trouble is that for many of these applications, Hunspell is actually embedded within the application itself. The Atom editors node-spellchecker embeds its own version of Hunspell, rather than using the system spell checker, as does LibreOffice, Firefox and others.

Embedded spelling libraries also brings up the issue of licensing. Aspell cannot be used in iOS applications, due to Apple not allowing software with LGPL licenses or dependencies in their app store⁴. When Mozilla was considering using Hunspell instead of the aging Myspell, one of the issues was that Hunspell, licensed only under LGPL, could not be imported into the Mozilla tree⁵. The Hunspell project later released Hunspell 1.1.3 under a GPL/LGPL/MPL license⁶, which is another reason for its wide spread adoption.

Interestingly enough, there are still a few e-mails trickling through Aspell’s development mailing list. One of them relates to patching Aspell so that it compiles correctly under Clang, a BSD licensed compiler. However, this patch seems to have been totally ignored⁷⁸. Without any responses, it’s difficult to tell if this is due to a lack of maintainer, or if it’s due to ideological differences in supporting a BSD licensed compiler.

Aspell was, and still is, a superior spell checker (at least for English dictionaries and my personal use cases). However, it’s also an example at how something that might have better performance, even in the open source world, can still fail to thrive. With Hunspell being, not only the default, but the only spell checker, in so many tools, many people struggle and are unaware that there are better alternatives available. Factors like licensing and maintainership have left this really amazing library in an evolutionary dead end.

Spell Checker Test Kernel Results. Aspell. Retrieved 22 Sep 2016. ↩
Hunspell vs. aspell. Benko. LYX Mailing List. 2009. Retrieved 22 Sep 2016. ↩
ChangeLog - GNU Aspell 0.60.7-pre. Retrieved 7 September 2016. Aspell.net. ↩
Re: [aspell-devel] Need Help. 9 Nov 2011. Da Silva. Aspell Devel Mailing List. ↩
Bug 319778 - (hunspell) Replace MySpell with HunSpell (comment 1) . Leeuwen. Mozilla Bugzilla. 10 Dec 2005. ↩
Bug 319778 - (hunspell) Replace MySpell with HunSpell (comment 4). Timar. Mozilla Bugzilla. 11 Dec 2005. ↩
[aspell-devel] Aspell Clang++ Compilation Patch. 24 February 2016. Hypo Stases. Aspell Devel Mailing List. ↩
[aspell-devel] Aspell Clang++ Compilation Patch. 8 April 2016. Hypo Stases. Aspell Devel Mailing List. ↩

For the past nine months, I’ve been using a BPI-R1 as a personal home router. It’s a small, affordable router board with a Dual-core ARMv7 processor, 1GB of Ram and Gigabit Ethernet. It and can run several flavors of Linux, however getting the initial setup going was a little tricky with the way the Ethernet switch/vlans are configured. The following is a guide to setting up a BPI-R1 using the Bananian Linux distribution. Bananian, as the name suggests, is based off Debian. I used it because it was the first distribution I could get working well. The instructions were pretty standard; using dd to write an image to an sdcard. The installation image and guide can be found on the bananian download page.

By default Bananian uses DHCP to get an IP address. At the time I didn’t have a monitor or a keyboard to plug into the HDMI/USB ports on the board, so I connected the WAN interface to my laptop where I started a DHCP server to give it an address. After that I was able to SSH into the board as the root user using pi, the default password. If you have a keyboard and HDMI monitor, the following steps can also be done from the console.

You should start by running the bananian-config script for setting your root password, timezone information and, most importantly, configuring the hardware type as BPI-R1 (the default is the standard bananian board).

For security, use either bananian-config or passwd to change the default password for the root user!

The base Bananian comes with both the nano and vi editors. The first thing we’ll want to do is configure the switch by editing /etc/network/if-pre-up.d/swconfig. Open it with the editor of your choice and take note of the line exit 0 as shown:

#!/bin/sh

#---------------------------#
# BPI-R1 VLAN configuration #
#---------------------------#
#
# This will create the following ethernet ports:
# - eth0.101 = WAN (single port)
# - eth0.102 = LAN (4 port switch)
#
# You have to adjust your /etc/network/interfaces
#
# Comment out the next line to enable the VLAN configuration:
#exit 0

ifconfig eth0 up

# The swconfig port number are:
# |2|1|0|4|  |3|
# (looking at front of ports)

swconfig dev eth0 set reset 1
swconfig dev eth0 set enable_vlan 1
swconfig dev eth0 vlan 101 set ports '3 8t'
swconfig dev eth0 vlan 102 set ports '4 0 1 2 8t'
swconfig dev eth0 set apply 1

As the default file indicates, comment out the exit line to enable switch configuration. The BPI-R1 essentially has only one Ethernet controller. The WAN and LAN ports are designated by splitting the individual ports into their own independent vlans. The actual lan ports need to be bridged together to be used as a switch, which can be done by editing /etc/network/interfaces and configuring the interfaces as follows:

auto lo
iface lo inet loopback

auto eth0.101
	iface eth0.101 inet dhcp

auto eth0.102
	iface eth0.102 inet manual

auto wlan0
	iface wlan0 inet manual

auto br0
	iface br0 inet static
	bridge_ports eth0.102 wlan0
	bridge_waitport 0
	address 10.10.1.1
	network 10.10.1.0
	netmask 255.255.255.0

In this example, I’ve setup my private LAN network to be at 10.10.1.1. You can obviously use any address range you’d like within a private address spaces (e.g 10., 172.16, 192.168.). I’ve also configured the Wi-Fi adapter to be bridged directly with the LAN ports, placing both wireless and wired devices on the same network.

For this configuration to work, we’ll need the bridge utilities. While we’re at it, we can also install the dhcp server and wireless AP tools we’ll need later using the following commands:

apt-get update
apt-get install bridge-utils isc-dhcp-server hostapd hostapd-rtl

You may get some errors about the dhcp service not being able to start because we haven’t configured it yet. It’s safe to ignore these. Next, we’ll edit /etc/dhcp/dhcpd.conf to setup our dhcp server.

ddns-update-style none;
option domain-name-servers 8.8.8.8, 8.8.4.4;

default-lease-time 600;
max-lease-time 7200;
authoritative;
log-facility local7;

subnet 10.10.1.0 netmask 255.255.255.0 {
  range 10.10.1.10 10.10.1.100;
  option routers 10.10.1.1;
}

The above example establishes a pool of IPs for our LAN network. It also relies on Google DNS by using 8.8.8.8 and 8.8.4.4 as the nameservers. You can change this to use your ISPs nameservers or setup your own instead. The default dhcpd.conf has comments for more complex dhcp options if you require them.

Next, we’ll setup our our Wi-Fi access point by creating /etc/hostapd/hostapd.conf and configuring it with the following:

ctrl_interface=/var/run/hostapd
ctrl_interface_group=0
macaddr_acl=0
auth_algs=3
ignore_broadcast_ssid=0

# 802.11n related stuff
ieee80211n=1
noscan=1
ht_capab=[HT40+][SHORT-GI-20][SHORT-GI-40]

#WPA2 settings
wpa=2
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP
rsn_pairwise=CCMP

# CHANGE THE PASSPHRASE
wpa_passphrase=changeme

# Most modern wireless drivers in the kernel need driver=nl80211
#driver=nl80211
driver=rtl871xdrv
max_num_sta=8
beacon_int=100
wme_enabled=1
wpa_group_rekey=86400
device_name=RTL8192CU
manufacturer=Realtek

# set proper interface
interface=wlan0
bridge=br0
hw_mode=g
# best channels are 1 6 11 14 (scan networks first to find which slot is free)
channel=6
# this is the network name
ssid=ExampleSSID

In the above configuration, be sure to adjust the wpa_passphrase and ssid for your setup. To get hostapd to use this new configuration, edit /etc/default/hostapd and uncomment the DAEMON_CONF variable.

We’re now ready to restart some services with our new configuration. Run the following commands:

/etc/init.d/networking restart
/etc/init.d/isc-dhcp-server restart
/etc/init.d/hostapd restart

At this point, devices connected to the switch ports of the BPI-R1 should be able to obtain IP addresses and Wi-Fi devices should be able to connect as well. However, they won’t be able to access the Internet.

First, edit /etc/sysctl.conf and uncomment out the following line to enable ip forwarding:

#net.ipv4.ip_forward=1

This will ensure ip forwarding will be enabled on reboots. To enable it right now, run the following:

sysctl net.ipv4.ip_forward=1

Next, we need to add some iptables rules to allow for Network Address Translation (NAT) between our LAN and WAN networks.

iptables -A INPUT -i br0 -j ACCEPT
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -s 10.10.1.0/24 -i br0 -j ACCEPT
iptables -A FORWARD -d 10.10.1.0/24 -i eth0.101 -j ACCEPT
iptables -t nat -A POSTROUTING -o eth0.101 -j MASQUERADE
iptables -P INPUT DROP
iptables -P FORWARD DROP

In the above example, we start by accepting everything from localhost and our LAN (the physical ports and Wi-Fi bridged together). The next line is to establish a stateful firewall. See the iptables documentation for more information on connection tracking. Next we have some forwarding and masquerading rules used for our NAT so our LAN can communicate to the outside world. Finally, we add some rules to drop all other requests that we haven’t explicitly allowed.

If you want to allow SSH from the WAN port (you did remember to set a strong password, right?), you can use the following command to open up port 22 from the WAN interface:

-A INPUT -i eth0.101 -p tcp -m tcp --dport 22 -j ACCEPT

If you want to be security aware, you should modify the /etc/ssh/sshd_config to not allow root logins and create a separate user to login with. You may also want to have sshd run on a non-standard port (be sure to adjust the firewall rule above appropriately).

To make these rules persistent after reboots, run the following:

cat << EOF  > /etc/network/if-pre-up.d/iptables
#!/bin/sh
iptables-restore --counters < /etc/iptables/rules.v4
exit 0
EOF
chmod 755 /etc/network/if-pre-up.d/iptables
mkdir -p /etc/iptables
iptables-save > /etc/iptables/rules.v4

There you have it. Your BPI-R1 should now be a fully functioning IPv4 router and wireless access point. Be sure to run regular updates for security using the following commands:

bananian-update
apt-get update
apt-get upgrade

Several Months of Operation

Lately my BPI-R1 has been frequently locking up, and in some cases, rebooting into a very insecure configuration. My next post will deal with the issues I’ve faced using this device as my primary router over the past year.

I recently I purchased a 4K monitor which I intended to use with both my laptop and my desktop. Both machines support a resolution of 4096x2160 over their respective display ports. Individually, each machine works well with the monitor, the Windows laptop being able to drive it at 50Hz and the Linux desktop able to drive it at 60Hz. I’ve owned many KVM switches in the past without major issues, so I was surprised to learn that using a KVM with a 4K/UHD monitor proved to have significantly more challenges than previous interfaces.

Clarification: I use the terms 4K and UHD interchangeably in this post, but it's important to note they are two different things. UHD is generally accepted to be 3840×2160, while my current setup is running at 4096x2160 (4K DCI). I want to note that I did attempt to use both KVMs in UHD mode at my monitors non-native resolution of 3840x2160 (most due to Configuration challenges) in both Windows and Linux. I ran into the same issues you see below in both UHD and 4K DCI resolutions. I didn't track how often these glitches occurred, so I can't comment on which resolution was more reliable. It's important to note that, officially, both of these KVMs only support the UHD resolution of 3840x2160.(Updated: 2016-10-30)

StarTech SV231MDPU2

The StarTech SV231MDPU2 has two mini DisplayPorts and claims to support 4K at 60Hz. When I first attempted to use the StarTech, only one of my two machines would successfully display via the KVM. I had been using the cables that came with the StarTech, and after several attempts at changing out to different cables, I did get both screens working. However they would frequently cut out and take five to ten seconds to reinitialize. StarTech support was helpful. Even though I was using the cable that came with the KVM, they informed me the total length from machine to KVM and then KVM to monitor, should be less than three meters (10 feet). I was even cross-shipped a replacement in order to help solve my disconnect issues.

Most KVMs support using a hotkey, typically a key not commonly used such as Scroll Lock or System Request, to switch between machines. This key is usually platform independent, but not with the StarTech. It requires a program to be installed in Windows in order for hotkey switching to work. At the time, I didn’t have a keyboard with a Scroll Lock, and the hotkey was not reconfigurable to another key. However, the odd software based implementation of the hotkey allowed me to switch away from the Windows laptop using the on-screen accessibility keyboard.

Unfortunately I couldn’t switch back from Linux using the hot key, but the KVM did have a physical switching button. Had I been able to get the StarTech more stable without frequent display interruptions, I would have looked into reverse engineering their Windows program to see if I could send equivalent HID messages to the KVM from Linux in order to enable hotkey based switching. However the disconnection issues led me to attempt using another KVM switch.

IOGear GCS62DP

I’ve used several IOGear KVM switches in the past and they’ve all worked pretty much without issue. The IOGear GCS62DP has two full sized DisplayPort inputs. Its hotkey switching is implemented in the KVM hardware, so it works without the need of special drivers and is therefore operating system independent. As with other IOGear switches I’ve used, the mouse emulation layer didn’t work for my multi-button mouse. However, it was easy to disable mouse emulation using a hotkey combination that’s documented in the manual.

At first the IOGear seemed to preform better than the StarTech. However, there were still issues with it dropping and reconnecting the video signal. It wasn’t as bad as the StarTech, but still quite frequent. The timeouts weren’t even predictable. They could occur during graphics intense applications such as watching movies or playing video games, but they also seemed to occur when all I was doing was editing a text file.

The only task I could do to get the signal to predictably disconnect was using Google Maps in Chrome, which apparently renders at an extremely high framerate.

Both of the KVMs also occasionally have artifacts, similar to when a digital broadcast TV signal cuts in and out due to noise and interference. The following two videos show artifacts and glitches on both my Windows laptop and Linux desktop:

Best Practices

There is no physical difference between newer DisplayPort 1.2a cables and older 1.1 cables¹. The version number is for the protocol over the same hardware. Still, 4k @ 60Hz is a lot of data. In order to maintain 4k@60Hz resolution and refresh rate, the DisplayPort interface on your video card needs to support High Bit Rate 2 (HBR2), and it will come close to maxing out that connection as it transmits nearly 18 gigabits (2.25 gigabytes) per second! With that quantity of data, poor quality cables can lead to noise and signal reduction causing disconnects and cutouts. Check reviews to find good quality cables, and moreover, buy short cables. Cable length is critical, and it’s best if cables between the PC, KVM and Monitor both be no longer than 1 meter (3.3 feet). Remember, your total cable length between all three devices should no exceed 3 meters (10 feet).

Conclusions

I’ve purchased and tried several different cables with both KVMs, yet I’m still having trouble with data cutting out. There are not a lot of options currently for 4K KVM switches that work at high refresh rates (or at least claim to). There are others that seem to have good reviews, but the prices quickly jump. I admit I may have just bad luck with cables. If not, for the time being, it seems like affordable consumer level 4K KVMs may still be problematic and buggy for early adopters.

How to Choose a DisplayPort Cable, and Not Get a Bad One!. Retrieved 29 October 2016. DisplayPort.org. Archived version ↩

Previously I had written a guide to using a Banana Pi BPI-R1 as a router. As I write this, I’ve been running the BPI-R1 as my home gateway/firewall for approximately nine months. Initially I had problems with the router freezing and needing to be power-cycled every few weeks. Although this is somewhat commonplace and accepted on consumer commodity routers, it shouldn’t be necessary on a piece of hardware designed for hobbyists. Furthermore, there were other stability and hardware issues that could cause the BPI-R1 to reboot as a switch, with public IPs being assigned to internal machines. This effectively disabled the firewall, leaving internal machines in a potentially vulnerable state.

For some reason, dmesg -w doesn’t continue to follow kernel log messages on Bananian. Eventually I bought a monitor that had HDMI inputs, so I could look at the console following one of these freeze-ups.

I suspected that I may have had a faulty sdcard, causing the router to crash. I also noticed the crashers would occur when pulling large amounts of data over Wi-Fi (i.e. when running a large rsync operation). I replaced the sdcard with a new one, and the crashes did stop. However, the Wi-Fi occasionally stoped responding as an access point. At least now, it no longer takes the rest of the BPI-R1 down with it. I’ve attempted restarting hostapd to fix the Wi-Fi issues, but only a full reboot seems to get it back into a functional state.

One thing that bothered me is that if the BPI-R1 reboots and is unable to load its operating system, its default role is to become a simple switch between every Ethernet port, including the WAN port (they all share the same Ethernet controller and are only separated by VLAN configuration, remember?). This led to my internal machines occasionally getting real, public IP addresses.

This is a pretty big security issue, as now I have internal machines that are unintentionally exposed to the world, because the gigabit Ethernet is shared between all the ports. Wi-Fi devices aren’t affected as they depend on the software bridge which requires a running OS. There isn’t really a fix for this situation from the operating system perspective. The BPI-R1 should simply disable the Ethernet switch if it fails to load an operating system, or require that it be enabled specifically by the operating system after it boots.

I’m currently looking at other solutions to replace my BPI-R1. As it stands, the BPI-R1 is unreliable, and shares physical Ethernet between WAN/LAN ports instead of using dedicated controllers. Its Wi-Fi adapter is not trivial to replace, as it’s soldered onto the board and connected to the USB bus.

Most home routers are terribly insecure to begin with. Many don’t auto-update, or accept unsigned updates, making them a large attack vector for hackers. Even newer devices released this year by major manufactures are still filled with crazy amounts of security holes¹. There is no shortage of attack vectors for this new Internet of Things, as it’s been labeled. Still, I’m curious how many commodity home routers have shared Ethernet between their WAN/LAN ports. What percentage of these home firewalls would reboot as a switch if they failed to load their operating systems? Could this be yet another easily exploitable and difficult to patch attack vector against commodity embedded hardware?

D-Link DWR-932 router is chock-full of security holes. 29 Sept 2016. Zorz. Help Net Security. ↩

Back in February, I decided to use a Banana Pi BPI-R1 as my primary router. There wasn’t a lot of documentation on setting up the R1 as a router, and understanding the port/vlan mapping was a little complicated, so I wrote a tutorial. The BPI-R1 only has one Gigabit Ethernet controller, shared between the WAN and LAN ports and configured via vlans, which I found could result in potential security issues. Due to stability and security issues, I decided to purchase a ClearFog Pro, which featured separate Ethernet adapters for its switch, primary and SFP port. However, what I soon found was a disappointing mess of hardware and software. The manufacturer has refused my request for a return, leaving me with a $240 USD worthless brick.

When the ClearFog Pro arrived, I was impressed. The housing was solid metal and the board looked very well designed. However, it came with absolutely no extra screws for the M.2 and mPCIe slots, nor did it come with full-size to half-size mPCIe brackets. I was able to find a screw for the rear M.2 socket, however none of my screws would fit the mPCIe risers. I ordered extension brackets and simply taped them down.

The Console

The ClearFog Pro doesn’t have any display outputs, but you don’t need to invest in a serial-to-USB cable to access the console. You can use the ftdi_sio kernel module, or set CONFIG_USB_SERIAL_FTDI_SIO in your configuration if you use a custom kernel. When you ClearFog’s console with a standard microUSB cable, you should see either a /dev/ttyACM0 or a /dev/ttyUSB0 device. Install picocom or minicom from your distributions package manager to connect to this new serial device. Be sure the specify the baud rate like so:

sudo picocom -b 115200 /dev/ttyACM0

Keep in mind the console is initialized by the bootloader, so you will need an sdcard flashed with the appropriate uBoot to initialize the console. The boot sequence should look something like the following:

...
BootROM: Image checksum verification PASSED

 __   __
|  \/  | __ _ _ ____   _____| | |
| |\/| |/ _` | '__\ \ / / _ \ | |
| |  | | (_| | |   \ V /  __/ | |
|_|  |_|\__,_|_|    \_/ \___|_|_|
         _   _     ____              _
        | | | |   | __ )  ___   ___ | |_
        | | | |___|  _ \ / _ \ / _ \| __|
        | |_| |___| |_) | (_) | (_) | |_
         \___/    |____/ \___/ \___/ \__|
 ** LOADER **


U-Boot 2013.01 (Jun 05 2016 - 07:56:34) Marvell version: 2015_T1.0p11

Board: A38x-Customer-Board-1
SoC:   MV88F6828 Rev A0
       running 2 CPUs
CPU:   ARM Cortex A9 MPCore (Rev 1) LE
...

Running Arch Linux ARM from the M.2 SSD

I was excited about the fact that this board had an M.2/SATA port on the back. However upon looking through the documentation, booting from the M.2 device required changing some dip switches, using a different version of uBoot and desoldering a resistor on the back of the board. Even after doing all of this, the ClearFog still need to pull its environment file from MMC (the sdcard), so you still need the sdcard¹.

With Arch, it was pretty simple to use the sdcard as a /boot partition, while placing the primary file system on the M.2 SSD. I simply created one large partition on the SSD and extracted the same image I had used for the sdcard. On the sdcard, I moved everything in /boot to the root (/) and erased the other directories. To get the kernel to load /dev/sda1 as my new root, I set the root parameter in the uEnv.txt.

root=/dev/sda1

The bootloader also needs to be adjusted to load the kernel from / instead of /boot. For that, the clearfog.env needs to regenerated. The process of this can be found in the build scripts for Arch’s Clearfog package². Simply create a clearfog.txt with the bootdir set to /:

bootcmd=run loaduenv; run startboot
bootdir=/
bootfilez=zImage
console=ttyS0,115200
loadaddr=2080000
rdaddr=2880000
fdtaddr=2040000
fdtdir=/dtbs
fdtfile=armada-388-clearfog.dtb
root=/dev/mmcblk0p1
mainargs=setenv bootargs console=${console} root=${root} rw rootwait ${optargs}
loadkernel=ext4load mmc 0:1 ${loadaddr} ${bootdir}/${bootfilez}
loadfdt=ext4load mmc 0:1 ${fdtaddr} ${fdtdir}/${fdtfile}
startboot=run mainargs; run loadkernel; run loadfdt; bootz ${loadaddr} - ${fdtaddr}
loaduenv=echo Checking for: ${bootdir}/uEnv.txt ...; if test -e mmc 0:1 ${bootdir}/uEnv.txt; then ext4load mmc 0:1 ${loadaddr} ${bootdir}/uEnv.txt; env import -t ${loadaddr} ${filesize}; echo Loaded environment from ${bootdir}/uEnv.txt; echo Checking if uenvcmd is set ...; if test -n ${uenvcmd}; then echo Running uenvcmd ...; run uenvcmd; fi; fi;

Then regenerate the clearfog.env file.

mkenvimage -s 0x10000 -o clearfog.env clearfog.txt

Rerun the sd_fusing.sh script. It will take the new clearfog.env, merge it with the bootloader and place it in the boot sector of the sdcard.

Finally, update the /etc/fstab on the SSD to mount the sdcard as /boot. This will allow the operating system to get kernel and bootloader updates via the package manager.

# <file system>	<dir>	<type>	<options>	<dump>	<pass>
/dev/mmcblk0p1 /boot ext4 auto,noatime 1 2

ClearFog Wi-Fi/Bluetooth

I had problems with Wi-Fi from day one. The Intel 3160 Wi-Fi adapters I purchased would not work with the Arch ARM image for the ClearFog Pro. The bluetooth modules would get into a continual reload loop:

[  538.794399] usb 1-1: new full-speed USB device number 31 using orion-ehci
[  538.950815] usb 1-1: New USB device found, idVendor=8087, idProduct=07dc
[  538.957540] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[  538.980818] Bluetooth: hci0: read Intel version: 3707100100012d0d00
[  538.987175] Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.0.1.2d.d.bseq
[  539.197831] Bluetooth: hci0: Intel Bluetooth firmware patch completed and activated
[  539.765912] usb 1-1: USB disconnect, device number 31
[  540.094346] usb 1-1: new full-speed USB device number 32 using orion-ehci
[  540.250815] usb 1-1: New USB device found, idVendor=8087, idProduct=07dc
[  540.257544] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[  540.283817] Bluetooth: hci0: read Intel version: 3707100100012d0d00
[  540.290171] Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.7.10-fw-1.0.1.2d.d.bseq
[  540.501816] Bluetooth: hci0: Intel Bluetooth firmware patch completed and activated
[  541.071191] usb 1-1: USB disconnect, device number 32
[  541.404291] usb 1-1: new full-speed USB device number 33 using orion-ehci
...
...

When attempting to load the Intel Wi-Fi modules, I’d get the following:

[  512.973436] Intel(R) Wireless WiFi driver for Linux
[  512.978347] Copyright(c) 2003- 2015 Intel Corporation
[  512.986975] iwlwifi 0000:02:00.0: loaded firmware version 17.352738.0 op_mode iwlmvm
[  513.002979] iwlwifi 0000:02:00.0: Detected Intel(R) Dual Band Wireless AC 3160, REV=0xFFFFFFFF
[  513.011678] iwlwifi 0000:02:00.0: L1 Disabled - LTR Disabled
[  513.034620] ------------[ cut here ]------------
[  513.039259] WARNING: CPU: 1 PID: 2771 at drivers/net/wireless/iwlwifi/pcie/trans.c:1552 iwl_trans_pcie_grab_nic_access+0x118/0x124 [iwlwifi]()
[  513.052066] Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
[  513.058867] Modules linked in: iwlmvm(+) iwlwifi mac80211 btusb btrtl btbcm cfg80211 btintel bluetooth rfkill marvell_cesa des_generic sch_fq_codel ip_tables x_tables autofs4 [last unloaded: iwlwifi]
[  513.076821] CPU: 1 PID: 2771 Comm: modprobe Tainted: G        W       4.4.23-2-ARCH #1
[  513.084754] Hardware name: Marvell Armada 380/385 (Device Tree)
[  513.090696] [<c0017b64>] (unwind_backtrace) from [<c0013264>] (show_stack+0x10/0x14)
[  513.098460] [<c0013264>] (show_stack) from [<c038bc9c>] (dump_stack+0x84/0x98)
[  513.105701] [<c038bc9c>] (dump_stack) from [<c002917c>] (warn_slowpath_common+0x80/0xb0)
[  513.113811] [<c002917c>] (warn_slowpath_common) from [<c00291dc>] (warn_slowpath_fmt+0x30/0x40)
[  513.122535] [<c00291dc>] (warn_slowpath_fmt) from [<bf2591b8>] (iwl_trans_pcie_grab_nic_access+0x118/0x124 [iwlwifi])
[  513.133182] [<bf2591b8>] (iwl_trans_pcie_grab_nic_access [iwlwifi]) from [<bf24e20c>] (iwl_read_prph+0x24/0x78 [iwlwifi])
[  513.144172] [<bf24e20c>] (iwl_read_prph [iwlwifi]) from [<bf2587b8>] (iwl_pcie_apm_init+0x22c/0x278 [iwlwifi])
[  513.154207] [<bf2587b8>] (iwl_pcie_apm_init [iwlwifi]) from [<bf25ba34>] (iwl_trans_pcie_start_hw+0x50/0xe0 [iwlwifi])
[  513.164971] [<bf25ba34>] (iwl_trans_pcie_start_hw [iwlwifi]) from [<bf27d7a0>] (iwl_op_mode_mvm_start+0x490/0x6a8 [iwlmvm])
[  513.176166] [<bf27d7a0>] (iwl_op_mode_mvm_start [iwlmvm]) from [<bf24e938>] (iwl_opmode_register+0x84/0xcc [iwlwifi])
[  513.186839] [<bf24e938>] (iwl_opmode_register [iwlwifi]) from [<bf2a2034>] (iwl_mvm_init+0x34/0x5c [iwlmvm])
[  513.196724] [<bf2a2034>] (iwl_mvm_init [iwlmvm]) from [<c000985c>] (do_one_initcall+0x90/0x1d4)
[  513.205445] [<c000985c>] (do_one_initcall) from [<c00bd9cc>] (do_init_module+0x60/0x374)
[  513.213556] [<c00bd9cc>] (do_init_module) from [<c0096cb8>] (load_module+0x1a58/0x1fd4)
[  513.221581] [<c0096cb8>] (load_module) from [<c009737c>] (SyS_init_module+0x148/0x160)
[  513.229517] [<c009737c>] (SyS_init_module) from [<c000f580>] (ret_fast_syscall+0x0/0x1c)
[  513.237624] ---[ end trace 71ba1aa822557891 ]---
[  513.362979] iwlwifi 0000:02:00.0: L1 Disabled - LTR Disabled
[  519.385699] iwlwifi 0000:02:00.0: Failed to load firmware chunk!

On the officially supported Debian Jessie image released my Solid Run, I couldn’t load any modules because they were all gzipped and their image didn’t have a version of the module-init-tools which supported loading modules from gziped ko files.

# modprobe iwlwifi
modprobe: ERROR: ../libkmod/libkmod.c:557 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.8-devel-16.06.0-clearfog/modules.dep.bin'

# depmod -a
# modprobe iwlwifi
modprobe: FATAL: Module iwlwifi not found

# find /lib -name "iwl*"
/lib/modules/4.4.8-devel-16.06.0-clearfog/kernel/drivers/net/wireless/iwlwifi
/lib/modules/4.4.8-devel-16.06.0-clearfog/kernel/drivers/net/wireless/iwlwifi/dvm/iwldvm.ko.gz
/lib/modules/4.4.8-devel-16.06.0-clearfog/kernel/drivers/net/wireless/iwlwifi/mvm/iwlmvm.ko.gz
/lib/modules/4.4.8-devel-16.06.0-clearfog/kernel/drivers/net/wireless/iwlwifi/iwlwifi.ko.gz
/lib/modules/4.4.8-devel-16.06.0-clearfog/kernel/drivers/net/wireless/iwlegacy
/lib/modules/4.4.8-devel-16.06.0-clearfog/kernel/drivers/net/wireless/iwlegacy/iwlegacy.ko.gz
/lib/modules/4.4.8-devel-16.06.0-clearfog/kernel/drivers/net/wireless/iwlegacy/iwl3945.ko.gz
/lib/modules/4.4.8-devel-16.06.0-clearfog/kernel/drivers/net/wireless/iwlegacy/iwl4965.ko.gz

Per a forum discussion³, it seemed like there might have been some issues with the Solid Run kernel and the Intel drivers/firmware. I decided to order a different Wi-Fi adapter and focus on getting basic Ethernet routing working. That would be my next great disappointment.

ClearFog Ethernet

The ClearFog has one independent Ethernet port (which most would use as a primary or WAN port), one SFP port and one set of switching Ethernet ports (typically used as the LAN ports in a router configuration). I was able to get the single/WAN port working with no issues. Unfortunately, the switching LAN ports were totally non-functional. The kernel detected the switch as eth1 as shown:

[    4.907842] mvneta f1030000.ethernet eth1: [0]: detected a Marvell 88E6176 switch

I also had individual lan1 through lan6 devices. I attempted to use the bridge-utils to create a bridge br0, add all the individual lan* devices to it, and bring it up with an IP address. I wasn’t able to connect to any devices attached to the switch. I attempted to assign an address to eth1 directly without using a bridge, adding eth1 to a bridge with all the lan* devices, and several other combinations. I even attempted to install swconfig out of the Arch AUR to see if I could configure the switches using it, but it failed to find any switching devices.

When I tried to run the officially supported Debian image released by SolidRun, I got the following in my kernel logs:

[ 2197.651279] mvneta f1030000.ethernet eth1: [0]: could not detect attached switch
[ 2197.658743] mvneta f1030000.ethernet eth1: [0]: couldn't create dsa switch instance (error -22)
[ 2198.905051] mvneta f1030000.ethernet eth1: [0]: could not detect attached switch
[ 2198.912500] mvneta f1030000.ethernet eth1: [0]: couldn't create dsa switch instance (error -22)
[ 2200.155098] mvneta f1030000.ethernet eth1: [0]: could not detect attached switch
[ 2200.162546] mvneta f1030000.ethernet eth1: [0]: couldn't create dsa switch instance (error -22)
[ 2201.405104] mvneta f1030000.ethernet eth1: [0]: could not detect attached switch

There is another version of this board known as the ClearFog Base which contains two Ethernet ports instead of a switch. According to a forum post I discovered, the supported base image for the ClearFog Base had the wrong device tree configuration; it contained the configuration for the ClearFog Pro. Other users discovered and showed how to reconfigure and rebuild the dts file, however the solution didn’t seem to work for everyone⁴.

I started discovering more posts about the Ethernet, SPF cage and switch all indicating various problems and fixes. Judging from the above output on the officially supported Debian image, I suspect that the switch on my particular board may just be defective.

ClearFog Support

When I saw this board, it looked like it had everything I needed for a basic home gigabit router setup to replace my BPI-R1. However after dealing with the issues I’ve gone over, I decided this board wasn’t worth any further effort. I contacted SolidRun support, mentioning the problems I’ve stated above and that I wanted to return the item. Rather than address any of the problems or answer my question about obtaining an RMA, I received the following:

Dear Sumit,

thank you for your email.

We currently working on Kernel and Debian + LEDE/OpenWRT to get it running better -
but this product is still in an early stage and please keep in mind that the board
is an evaluationboard/reference board, as most customers use our microsom and
build up their own carrierboard.

Kind regards

Malte  - Team Support

When ordering either the ClearFog Base or Pro on their website, they had options for a full enclosure, other customizations and even a list of supported operating systems. They even claimed the ClearFog series was a “ready to deploy solution.” Except for a statement about using the sdcard version for development and the eMMC version for production, there was nothing to indicate that this was a non/semi-functional evaluation board.

Product description from the SolidRun Website

I replied to this e-mail indicating I simply wanted to return the device. I should have read their warranty information before ordering. Although the device is under warranty for a year, the return period is only fourteen days⁵. I still haven’t received a reply to my second e-mail requesting a return.

Final Review

First, I’d like to thank the Arch Linux ARM community for all their help. I found people in the forums and IRC channels supportive and helpful. There are a couple of people in the Arch/ARM and Armbian communities that have been testing and working with the ClearFog Pro and documenting their work.

That being said, I cannot recommend the ClearFog Pro. SolidRun has little to no support for their board. From what I’ve seen, almost all problems were resolved by other users and developers in the community. Several issues seemed to be due to defective hardware as well. I first learned about the ClearFog Pro several months ago when I was setting up a Banana Pi BPI-R1, so it’s been on the market for at least a year and shouldn’t have these basic hardware issues. It’s marketed as being a “ready to deploy” solution, however there are both major hardware issues and software issues in the officially supported operating systems and kernels. What seemed like it would be an amazing router board has turned into nothing but frustration and disappointment.

If you’re part of a startup company with dedicated and experienced kernel engineers and embedded developers, purchasing one of these as a development board might be worth the effort. If you’re lucky enough to get one that functions correctly, there is a good chance you will be doing SolidRun’s job for them by creating working kernel and software patches. If you’re a hobbyist looking for a easy, customizable networking solution or you’re a company looking for a quick path to a hardware deliverable, you should look elsewhere.

SolidRun - Docs products:a38x:software:development:u-boot. Retrieved 23 October 2016. Archived Version ↩
ArchLinuxArm uboot-clearfog package commit 9ec7687. ArchLinuxArm. Retrieved 2 November 2016. ↩
Intel 3160 on ClearFog (marvell). 4 October 2016. Arch Linux Arm Forum. ↩
Second Ethernet (clearfog base) - SolidRun Community. Retrieved 23 October 2016. Archived Version ↩
Warranty and Return Terms - SolidRun. Retrieved 23 October 2016. ↩

I had been using a Banana Pi BPI-R1 as my router. Due to some reliability issues, I attempted to replace it with a ClearFog Pro, which also met with unfavorable results. Many hobbyist tend to use old PCs as routers, as I have in the past. Due to some scaling down, I no longer have a bucket of spare parts to build a low powered Linux box. Instead of going with another ARM solution, I decided to build a custom x86_64 system based on the Thin-ITX form factor. I discovered that a x86/Thin-ITX solution was more reliable than the ARM alternatives I have tried, and ran the same cost as a high end home router.

In 2003, I was in University and was using an old Pentium 90 as a Linux router to provide Internet to my seven other housemates. At one point the hard drive on the router died and I couldn’t afford to replace it, so I ran dhcp/tftp on my file server (pictured to the right of the router), PXE booted the router and mounted its root file system via NFS. This meant that the file server had to be up first and the router booted remotely from it, before the router could serve Internet.

A Pentium 90 was more than enough at the time to handle routing cable Internet for around eight devices. If I still had a Pentium 90, I’d be curious to run benchmarks against it and my current setup. Many of the components listed are definitely overkill for a router, even considering that I need to handle gigabit speeds.

Parts List

Part	Cost
ASUS H110T Thin-ITX Motherboard	$69.99
Intel Celeron G3900	$42.18
Silverstone PT13B Thin-ITX Case	$55.99
Silverstone SST-AR04 CPU Cooler	$29.79
2x 4GB DDR4 (8GB) Ram	$43.98
128GB M.2 2242 SSD	$45.99
Intel 8260 M.2 802.11ac Wi-Fi Card	$22.49
2 x 11” MHF4 to RP-SMA Pigtails	$7.95
120W 19V DC Power Brick	$19.99
Netgear 5-port Gigabit Switch	$19.99
2x +5dBi Wireless Antenna	$11.90

Router viewed from the top with case open

Lessons Learned

There were a couple of parts comparability mistakes I made, which did result in some returns. The DC power supply needed for Thin-ITX boards isn’t clearly stated in any of the manuals I looked at, but after some research I realize that all Thin-ITX boards use a circular plug with a 7.5mm outer radius and 5mm inner radius. The exact size may vary on some models by a few tenths of a millimeter, but it’s the plug that’s used on many HP laptops. The ASUS H110T supports both 12V and 19V power supplies with different wattage requirements depending on the type of processor. Be sure to check your specific board and processor to make sure you buy the correct power brick.

I also made the mistake of purchasing an Intel HTS1155LP fan, which doesn’t fit in a Thin-ITX case of this form factor. It’s designed of wider cases that have room for the heat sink to dissipate heat off to the side of the motherboard. The Silverstone SST-AR04 fits directly on top of the processor and allows room for USB header access. Despite its low form factor, it does a good job of dissipating heat from one of these low powered processors.

In this setup, I also have an external switch. There are thin-ITX boards with multiple gigabit Ethernet ports, some which even have a built in Atom processor and memory. However, the ones I found only had VGA output. My monitor only supports DisplayPort/HDMI and I wanted to avoid needed an additional adapter or trying to rely on the board booting up with a headless Linux distribution (one that enables SSH and DHCP on boot for remote headless installation/diagnostic).

Some of the Jetway boards offered a serial port console, but that would still require yet another serial to USB adapter, plus the horizontal PCI-E expansion slots on some of these boards would require modifying standard Thin-ITX cases¹. Using this board, plus a switch, ended up not being significantly higher in cost than thin-ITX board with multiple Ethernet ports built-in.

Router viewed from the top with case open and switch

Operating System

At first I looked at Alpine Linux, but became frustrated trying to have it install to disk and boot from UEFI. The testing branch of Alpine does have a grub2 package (the base uses the syslinux bootloader), but I didn’t want to run a testing branch as the OS for my primary router. The Alpine wiki currently has working instructions for creating a UEFI USB boot image², but their GPT wiki page still seems to instruct users to boot in legacy/MBR mode³. Once tools like efibootmgr have Alpine packages⁴⁵ and grub2 is in the stable branch, I’m sure this will be trivial. As it stands, I couldn’t get the stock kernel to boot via rEFInd or Gummiboot. It may have been missing EFI stub support, or it may have simply been my configuration, but I eventually gave up without finding a solution.

I realize this may tread on ideological differences, but the fact remains that systemd has been one of the most divisive topics in the Linux community. I prefer to avoid using it for a number of reason I won’t go into here. Although I use Gentoo and Funtoo on my desktop and laptop respectively, I wanted a binary distribution for this router.

I decided to take a chance and try Void Linux. With its first release in 2008, Void Linux is a relatively new, minimalistic distribution. The documentation and installation were aimed to an audience that’s already familiar with Linux. Even considering that, it was dead simple. The runit init system is quite different, but still well documented. All the packages I’d need for a router, such as iptables, openvpn, hostapd and others, were all avaiable in the default package tree. I had Void Linux up and running as a router is just a couple of hours.

Conclusions

The build I put together was a little overkill and a substantially good router could be built for less by using less RAM, smaller/cheaper storage and an ITX board with a built-in processor. And older Core2 system with gigabit Ethernet would be just as powerful as a router.

The large advantage of using an Intel based solution, over an ARM solution, was the choice in operating systems. Any standard x86/64 Linux distribution has hardware support for most major components, where as most ARM distributions need to be customized for specific boards. Although there’s been some work done to bring better standards to ARM board with things such as Device Trees, the landscape is still a mess and ARM fragmentation is still a major issue.

Overall the router has been stable, both over wired and Wi-Fi connections. Void Linux has been easy to work with and the hardware way overkill for a router. A system like this would also work well as as a media center PC or a low powered gaming system. What is truly amazing is the Thin-ITX form factor, which is able to cram an incredible amount of standard x86/64 hardware in a very small amount of space.

Jetway NF9HG-2930 + Silverstone PT13 Slim ITX. 16 June 2015. Wiretap. PF Sense Forums. Retrieved 9 November 2016. ↩
Create UEFI boot USB - Alpine Linux. Retrieved 9 November 2016. Alpine Linux Wiki. Archived Version ↩
Installing on GPT LVM - Alpine Linux. Retrieved 9 November 2016. Alpine Linux Wiki. Archived Version ↩
Feature #3819: A efibootmgr package would be useful. 28 January 2015. Alpine Linux Bug Tracker. Retrieved 9 November 2016. ↩
Feature #5730: Package request: efibootmgr. 17 June 2016. Alpine Linux Bug Tracker. Retrieved 9 November 2016. ↩

Mesosphere DC/OS is a data center operating system, based on Apache Mesos and Marathon. It’s designed to run tasks and containers on a distributed architecture. It can be provisioned either on bare metal machines, within virtual machines or on a hosting provider (what some people like to call “the cloud.”). I wanted to see what was involved in setting up my own DC/OS instance, both locally and with a provider, for running some of my own projects in containers. I wanted to keep this cluster as low cost as possible, and ran into some issues with the Terraform installation in the DC/OS documentation¹. The following is a brief look at setting up a minimal DC/OS cluster on Digital Ocean.

Provisioning

For one of my projects, I created vSense, a devops provisioning system build around Vagrant and Ansible. It’s used for creating both development and production environments for BigSense, an open source sensor network system. Vagrant boxes can vary between providers, meaning the scripts need to be adjusted to handle differences between VirtualBox images for development and KVM base boxes for production. Thankfully, DC/OS does have an official Vagrant project², and supports deploying to hosted providers using a Terraform script¹.

The following can be used to bring up a local four node cluster (boot, manager, private agent and public agent) using local VirtualBox VMs:

git clone https://github.com/dcos/dcos-vagrant
cd dcos-vagrant
vagrant plugin install vagrant-hostmanager
vagrant up m1 a1 p1 boot

DC/OS provides documentation for installing nodes on several hosted platforms as well. The following is taken from their documentation for using Digital Ocean as a provider:

git clone https://github.com/jmarhee/digitalocean-dcos-terraform
cd digitalocean-dcos-terraform
cp sample.terraform.tfvars terraform.tfvars
# adjust your settings and API token
eval $EDITOR terraform.tfvars
terraform apply

DC/OS Nodes Running on Digital Ocean Droplets

It’s important to note that DC/OS, despite its name, is not really an operating system. It simply installs Docker and other packages to bootstrap itself on another Linux distribution. When using the Vagrant/VirtualBox installation above, it uses CentOS 7 for its individual virtual machines. Curiously for Digital Ocean, it installs itself onto CoreOS virtual machines.

Authentication

If you start with a fresh install of DC/OS and connect to the master node via HTTP, you’ll get an authentication page allowing the first account to be the administration account. By default, you cannot create this account. You are required to use one of the three default identity providers: Github, Microsoft or Google. DC/OS community edition has no built in authentication system. In order to integrate with LDAP, Active Directory or another identity provider, you must purchase the enterprise edition. The community edition allows you to override the default configuration, but only supports OAuth providers and only provides documentation for using the non-free service Auth0³.

I really hesitated here. I rarely ever use external authentication, opting to use a strong password algorithms with e-mail based registration instead. I considered figuring out how to override the default, but then caved to my impatience and authenticated via github. This was a bad idea. Not only did I start getting unsolicited SPAM from Mesosphere on the e-mail associated with my Github account.

Unsolicited E-Mail from Mesosphere to an E-mail associated with my Github Account

I stated getting SPAM for a secondary account I created within DC/OS.

Unsolicited E-Mail from Mesosphere to an E-mail for an account I created

Furthermore, the e-mail for the new user I manually created didn’t come from a locally running mail server that was part of DC/OS. It was relayed via a completely different third party:

From: DC/OS <help@dcos.io>
Subject: You've been added to a DC/OS cluster
Received: from [54.163.223.191] by mandrillapp.com id dafa457b3e374123b427c283824bfa0f; Sat, 26 Nov 2016 06:58:25 +0000
X-SWU-RECEIPT-ID: log_aad8fa045b93454cca9d5a9ccabc3504-3
Reply-To: <help@dcos.io>
To: <--->

Also, by default, DC/OS has telemetry enabled. If you’re using the Terraform script for installation, it can be disabled by adding telemetry_enabled: 'false' to the make-files.sh script, in the section where it creates the config.yml⁴. I highly recommend you disable telemetry before starting up a cluster, even locally with Vagrant.

The SPAM didn’t arrive for a couple of days after I experimented with DC/OS. However, it still bothers me that the official DC/OS provisioning tools enable telemetry by default. It’s not as bad as removing tracking from Alfresco, which is hard coded, but it is unnecessary and is most likely used for marketing purposes.

Minimal Cost

As I’ve mentioned, the minimum number of DC/OS nodes required by default is four. Upon talking to other DC/OS administrators, I’ve found that it’s not necessary to separate out public and private nodes. If you ran with only public nodes, your minimum would drop to three VMs. By default, the Terraform script mentioned above provisions all its nodes as 4gb, which currently run $40 USD/month on Digital Ocean.

If you’re a startup with funding, that isn’t an unreasonable price, even when you start scaling up for redundancy. However, if you’re a small shop trying to get off the ground with limited funding, or if you’re like me where you just want to host your personal projects cheaply, this can seem prohibitively expensive. The smallest size that Digital Ocean offers is a 512mb instance for $5 USD/month, which seems like it’d be more than adequate for the boot node.

Unfortunately, the management node must be a 1gb instance. Anything less leads to an unstable master. As we’ll see below, we can enable swap space on these nodes, but even the master agent is a heavy enough process that it will cause thrashing and lockups on anything less than 1gb of physical memory.

Boot	Management	Public	Price Per Month	Price Per Year
4gb	4gb	4gb	$120	$1440
512mb	1gb	2gb	$35	$420
512mb	1gb	2gb x2	$55	$660
512mb	1gb	1gb	$25	$300
512mb	1gb	1gb x2	$30	$360

Keep in mind that by not creating any private nodes, you are trading off the security offered by having non-public facing containers (such as load balances or web servers) running on nodes only connected to a private network. This is also a minimal non-redundant solution. Redundancy requires either 3 or 5 master nodes and additional agent nodes as well.

Startup Issues

I wanted to use the smallest images possible to save on hosting costs. Unfortunately, both master and agent nodes refuse to start on anything smaller than 2gb images. If you have failures, you can SSH into the individual nodes using your SSH key, the IP address from the Digital Ocean web interface and the user core like so:

ssh -i do-key -lcore <node_ip>

The failures seem occur during the bootstrapping process in the dcos-download.service:

journalctl -u dcos-download.service
-- Logs begin at Thu 2016-12-08 06:37:51 UTC, end at Thu 2016-12-08 07:31:18 UTC. --
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 systemd[1]: Starting Pkgpanda: Download DC/OS to this host....
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: *   Trying 104.131.142.20...
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: * TCP_NODELAY set
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: * Connected to 104.131.142.20 (104.131.142.20) port 4040 (#0)
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: > GET /bootstrap/e73ba2b1cd17795e4dcb3d6647d11a29b9c35084.bootstrap.tar.xz HTTP/1.
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: > Host: 104.131.142.20:4040
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: > User-Agent: curl/7.50.2
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: > Accept: */*
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: >
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: < HTTP/1.1 200 OK
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: < Server: nginx/1.11.6
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: < Date: Thu, 08 Dec 2016 06:38:21 GMT
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: < Content-Type: application/octet-stream
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: < Content-Length: 581561548
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: < Last-Modified: Thu, 08 Dec 2016 06:37:03 GMT
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: < Connection: keep-alive
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: < ETag: "5848ff8f-22a9eccc"
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: < Accept-Ranges: bytes
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: <
Dec 08 06:38:21 digitalocean-dcos-public-agent-00 curl[1568]: { [13032 bytes data]
Dec 08 06:38:27 digitalocean-dcos-public-agent-00 curl[1568]: * Failed writing body (456 != 16384)
Dec 08 06:38:27 digitalocean-dcos-public-agent-00 curl[1568]: * Curl_http_done: called premature == 1
Dec 08 06:38:27 digitalocean-dcos-public-agent-00 curl[1568]: * Closing connection 0
Dec 08 06:38:27 digitalocean-dcos-public-agent-00 curl[1568]: curl: (23) Failed writing body (456 != 16384)
Dec 08 06:38:27 digitalocean-dcos-public-agent-00 systemd[1]: dcos-download.service: Control process exited, code=exited status=23
Dec 08 06:38:27 digitalocean-dcos-public-agent-00 systemd[1]: Failed to start Pkgpanda: Download DC/OS to this host..
Dec 08 06:38:27 digitalocean-dcos-public-agent-00 systemd[1]: dcos-download.service: Unit entered failed state.
Dec 08 06:38:27 digitalocean-dcos-public-agent-00 systemd[1]: dcos-download.service: Failed with result 'exit-code'.

If I try to download this file manually within the node, I can retrieve it successfully. The size of the file is over 500MB. Even the smallest node option of 512mb (memory), has 20GB of disk space. Then I looked at the individual partition tables:

1GB Image:

$df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        483M     0  483M   0% /dev
tmpfs           499M     0  499M   0% /dev/shm
tmpfs           499M  324K  499M   1% /run
tmpfs           499M     0  499M   0% /sys/fs/cgroup
/dev/vda9        27G  579M   26G   3% /
/dev/vda3       985M  588M  347M  63% /usr
tmpfs           499M  499M  4.0K 100% /tmp
/dev/vda1       128M   39M   90M  30% /boot
tmpfs           499M     0  499M   0% /media
/dev/vda6       108M   64K   99M   1% /usr/share/oem
tmpfs           100M     0  100M   0% /run/user/500

2GB Image:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        987M     0  987M   0% /dev
tmpfs          1003M     0 1003M   0% /dev/shm
tmpfs          1003M  428K 1003M   1% /run
tmpfs          1003M     0 1003M   0% /sys/fs/cgroup
/dev/vda9        37G  1.9G   34G   6% /
/dev/vda3       985M  588M  347M  63% /usr
tmpfs          1003M  320K 1003M   1% /tmp
tmpfs          1003M     0 1003M   0% /media
/dev/vda1       128M   39M   90M  30% /boot
/dev/vda6       108M   64K   99M   1% /usr/share/oem
tmpfs           201M     0  201M   0% /run/user/500

The installation services are using the /tmp partition, and it’s obviously too small to complete downloading the bootstrap image. By default, tmpfs allocates half the size of available memory to its filesystem. The easy solution is to modify the section of make-files.sh that creates the do-install.sh script to ensure we have enough room on /tmp prior to installation. The Digital Ocean instances also don’t come with any swap, so we should create some to ensure we don’t run into errors due to running out of memory⁵.

...
cat > do-install.sh << FIN
#!/usr/bin/env bash
mkdir /tmp/dcos && cd /tmp/dcos

# resize the tmpfs to ensure there's space for the dcos install
sudo mount -t tmpfs -o remount,size=1G /tmp

# setup swap
if [ ! -f /swapfile ]; then
  sudo fallocate -l 2G /swapfile
  sudo chmod 600 /swapfile
  sudo mkswap /swapfile
  sudo swapon /swapfile
  echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
fi

printf "Waiting for installer to appear at Bootstrap URL"
...

We’re not making the /tmp changes permanent by modifying the fstab, so rebooting the instances will set the /tmp allocation back to normal as well as clearing out the installation files.

IPv6

It is 2016, and a sustainable Internet does mean we need to start using IPv6. By default, the DC/OS Terraform scripts do not enable IPv6. Adding the following setting to dcos.tf allows the public nodes to have IPv6 addresses. You may want to add this to the other node types if you wish to have them accessible via IPv6 as well.

...
resource "digitalocean_droplet" "dcos_public_agent" {
  name = "${format("${var.dcos_cluster_name}-public-agent-%02d", count.index)}"
  ipv6 = "true"
  depends_on = ["digitalocean_droplet.dcos_bootstrap"]
...

Thoughts on DC/OS

This tutorial simply covered installation of DC/OS. We have only touched the surface, and haven’t discussed running application containers, using marathon-lb for load balancing, volume management, or security and firewall settings for individual nodes. None of these tasks are trivial and deserve tutorials of their own.

Also, we only looked at Digital Ocean, but DC/OS does have official documentation for deployments on AWS, Azure, GCE and Packet. I’d recommend comparing each to reduce service cost.

I’ve seen DC/OS deployed in the wild in full production environments. It’s ability to schedule and manage tasks is very powerful. It also comes at a cost of a dedicated support and development team. If you’re a startup with strong development and operations engineers, setting up some kind of task or container orchestration, whether it’s DC/OS or something else, can help easy the pain of scaling out later. For smaller side projects, DC/OS seems prohibitive in both time and service costs.

DigitalOcean DC/OS Installation Guide. Retrieved 10 December 2016. DC/OS. ↩↩²
Install DC/OS with Vagrant. Retrieved 10 December 2016. DC/OS. ↩
Configuring Your Security. DC/OS Documentation. Retrieved 6 December 2016. Archive Version ↩
Opt-Out. DC/OS Documentation. Retrieved 6 December 2016. Archive Version ↩
How To Add Swap Space on Ubuntu 16.04. 26 April 2016. Digital Ocean. ↩

Earlier this year, I decided to build a development desktop. It’s the first PC I’ve fully built in at least four years. While I was backpacking, I relied solely on my laptop for development work. Prior to that I had used desktops people were giving away, or systems I had build years ago and had just continually upgraded. Since this would be a Linux workstation aimed primarily at development, the hardware was focused on performance. It would be built with 32GB of DDR4 memory, a 6700K i7 processor and dual M.2 solid state NVME drives connected to the PCI-E bus in a software RAID0 (striped) configuration for performance.

Parts List

I wanted a small form factor machine and had originally looked at using a Asus Z170I mini-itx board with a Silverstone RVZ02B case. However, all the current ITX boards only come with one M.2 hard drive slot. A second M.2 can be added via a PCI-E adapter (which I’ve done on this build), but at the sacrifice of the only PCI-E slot. Going with a Micro ATX board allows room for a video card and higher memory capacity.

Type	Part	Cost
CPU	Intel Core i7-6700K 4.0GHz Quad-Core Processor	$349.99
CPU Cooler	Noctua NH-L9i 33.8 CFM CPU Cooler	$42.34
Motherboard	Gigabyte GA-Z170MX-Gaming 5 Micro ATX LGA1151 Motherboard	$138.99
Video	XFX Double D AMD Radeon HD 7850 2GB DDR5 (Used)	$ 110
Memory	G.Skill Ripjaws V Series 32GB (2 x 16GB) DDR4-3200 Memory	$189.99
Storage	Samsung 950 PRO 256GB M.2-2280 Solid State Drive	$181.95
Storage	Samsung 950 PRO 256GB M.2-2280 Solid State Drive	$181.95
Storage	Seagate Archive 8TB 3.5” 5900RPM Internal Hard Drive	$221.99
Case	Corsair Air 240 MicroATX Mid Tower Case	$79.99
Power Supply	EVGA SuperNOVA NEX 650W 80+ Gold Certified Fully-Modular ATX Power Supply	$78.10
Adapter	Lycom DT-120 M.2 PCIe to PCIe 3.0 x4 Adapter (Support M.2 PCIe 2280, 2260, 2242)	$24.90
Misc	Taxes and Shipping	$39.51
Total		$1639.71

Photos

Partition Scheme

M.2/NVME Solid State Drive with PCI-E Adapter

NVME devices show up in Linux as the device nodes /dev/nvme<number>n1. The n1 is the actual block device, and is tacked onto the end to support NVME namespacing¹. For this build, I configured nvme0n1 and nvme1n1 with an identical partition schema.

Device            Start       End   Sectors   Size Type
/dev/nvme1n1p1     2048    196607    194560    95M EFI System
/dev/nvme1n1p2   196608    391167    194560    95M Linux RAID
/dev/nvme1n1p3   391168  31641599  31250432  14.9G Linux swap
/dev/nvme1n1p4 31641600 500118158 468476559 223.4G Linux RAID

The purpose of each partition for nvme{0,1}n1 is as follows:

Partition	Purpose
p1	RAID1 (0.90) + EFI System Partition (ESP)
p2	RAID1 (0.90) + Linux Boot (Unencrypted)
p3	LUKS + Swap (Encrypted)
p4	RAID0 (1.2) + LUKS + Linux Root (Encrypted)

The ESP and boot partitions are setup as mirrored RAID. The older 0.90 metadata version must be used on the ESP partition so the UEFI boot process can identify it as a FAT32 partition. The 1.2 version of the metadata, used on the primary striped (RAID0) partition, adds addition headers making the underlying filesystem inaccessible to UEFI and anything else that’s not designed to read Linux software RAID partitions. Grub does have an mdraid1x.module that should have allowed the boot partition to use the RAID 1.2 schema as well, however I had trouble getting Grub to recognize the boot partition.

In Gentoo, encrypted swap partitions can be automatically created on boot by adding the following entries to /etc/conf.d/dmcrypt as shown:

swap=crypt-swap1
source=/dev/nvme0n1p3
options='-c aes-xts-plain64 -h sha512 -d /dev/urandom -s 512'

swap=crypt-swap2
source=/dev/nvme1n1p3
options='-c aes-xts-plain64 -h sha512 -d /dev/urandom -s 512'

Swap doesn’t need to be on its own RAID device, since the Linux kernel will stripe their usage automatically if they’re given the same priority as seen below². They can then be added to the /etc/fstab using their device mapper names:

/dev/mapper/crypt-swap1  none swap  auto,sw,pri=1  0 0
/dev/mapper/crypt-swap2  none swap  auto,sw,pri=1  0 0

RAID information from /proc/mdstat:

md2 : active raid0 nvme1n1p4[0] nvme0n1p4[1]
      468213760 blocks super 1.2 512k chunks

md1 : active raid1 nvme0n1p2[0] nvme1n1p2[1]
      97216 blocks [2/2] [UU]

md0 : active raid1 nvme0n1p1[0] nvme1n1p1[1]
      97216 blocks [2/2] [UU]

The full partition layout is illustrated by the following diagram:

Closing Remarks

My configuration has been optimized for performance at the expense of redundancy (i.e. the use of RAID0 striping). I used an existing external laptop and high capacity drive to use as backups, which are essential for running with this type of configuration.

Every data storage partition in my configuration is encrypted except for the Linux boot partition. This provides a high degree of data security in the case of hardware being stolen. However the hardware must still be physically secure, as tampering with either the ESP or boot partitions could allow an attacker to inject a custom bootloader to capture passwords.

After using this machine for nearly a year, I can say it’s been a stellar development system. With 32GB of ram and the NVME SSD drives running in RAID0, I can run several VMs and compile large project with ease. It has a decent enough video card for playing many of the independent titles that have come to Linux, although I’ve put Windows back on my laptop to be my primary gaming device.

Although the entire setup was a fun build, in many ways I felt that it was overkill. My previous laptop was more than capable as a development platform. For a fraction of the cost, I could have also retrofitted an older used Xeon workstation with comparable memory and a solid state drive. Comparing benchmarks or running an older machine side-by-side, the difference would be measurable. In fact, the differences in tasks such as compiling, transcoding video, running complex queries and starting virtual machines, are all noticeably faster than my previous machines. However, many everyday tasks are not CPU or memory limited, but instead network limited, to which a faster workstation doesn’t have a significant impact.

Cost aside, I really enjoyed building a machine again. Being my first full build in several years, I had to research the current state of processors, chipsets and what made the most economical sense. I’m satisfied that for the price I paid, I’ve gotten a very reliable system that has served me very well for several months and will continue to be a good development workstation for some time to come.

Finding your new Intel SSD for PCIe (think NVMe, not SCSI) . 10 October 2014. Ober. IT Peer Network (Intel). ↩
Setting up a (new) system - Linux RAID Wiki. Retrieved 18 Dec 2016. Linux RAID Wiki. ↩

For the past couple of years, we’ve seen a substantial amount of research committed by the tech and auto industry into self driving vehicles. Billions of dollars are being spent on a solved problem; depending on how you view the problem. If the domain of the problem is transportation and increasing populations within metropolitan areas, automated driverless trains have been a reality for quite some time. I’d argue that solving the domain space of individual automated cars, while contributing significantly to the fields of machine learning and computer vision, is a wasted effort when it comes to sustainable transportation solutions for the planet.

Recently Tesla suffered some major¹ setbacks² by what they have been mislabeling as self-driving vehicle³ accidents. Currently no consumer level fully autonomous vehicles are driving on the world’s roads today. Only test vehicles and pilot programs such as Uber’s driverless system in Pennsylvanian, which recently suffered the embarrassment of one of its vehicles driving the wrong way down a one way street⁴.

The idea of driverless car technology draws on our romanticism from science fiction. It brings us one step closer to a world where physical labor is phased out by robotic counterparts. Having fully autonomous vehicles on traditional roads would be a technological marvel, but it would only solve some transportation problems.

Automated Trains

Front of a Driverless Train on Singapore's Mass Rapid Transport

In London, the busiest lines of the subway system, namely Victoria, Jubilee, Central and Northern, are all automated. The Victoria line has been automated since it opened in 1968, while the Central line converted to automated operations in the mid 1990s and the Northern Line in 2012. Although many of the automated underground trains still have engineers in the cabs who simply push buttons to close/open doors, the Docklands Light Railway (DLR) line has never had drivers since it became operational in 1987. By the mid 2020s, London plans to roll out a fleet of 250 trains on existing subway networks which can be fully autonomous, removing the need for a driver entirely⁵.

Singapore Mass Rapid Transport (MRT) system is also entirely automated and has a daily ridership of 6.9 million people⁶. Vancouver, Canada boasts having the longest automated train system in the world with a daily ridership of nearly 400,000 people. Other automated lines can be found in Barcelona, Paris, Nuremberg, São Paulo, Beijing and many other cities throughout the world.

The United States does have automated train systems, but the majority of them are inter-terminal trains at major airports. There are some commuter automated train systems such as the Morgantown Personal Rapid Transit system in West Virginia, the Las Vegas Monorail and the Miami Metromover.

Automated rails systems are a reality, and have been for several decades. Although they vary in the degree of automation, there are many safe, fully automated driverless trains with massive transport capacity that literally move millions of people throughout their cities every day; far more than could be moved by traditional buses or car networks.

Capacity

Singapore’s North East Line is a completely underground and fully automated rail transport system. At a cost of S$4.6 billion, its sixteen stations have a daily ridership of over half a million people⁷. Chicago’s two busiest trains, the Red and Blue lines, have a daily ridership between 180,000 and 250,000 people⁸. Exact daily ridership figures for other systems are difficult to find as most cities publish rider per day/year metrics for their entire systems and not individual lines. Still, many large metropolitan areas move millions of people per day via commuter rail.

According to a study published by the Oregon Department of Transportation in 2013, the theoretical maximum throughput of a single lane of traffic, with passenger vehicles filled to capacity (4.2 persons) is 227,000 people per day. If you commute to work in a US city, you may notice that the high occupancy vehicle (OVH) lane with the diamond moniker is mostly empty. This suggests that most vehicles, like yourself if you’re stuck in traffic, only have one passenger. The theoretical maximum for single occupancy (1.4 persons) vehicles per lane is only 76,000 people per day.

Roadway	People per day per lane (pplpd)
2008 AADT from I-405 in LA	38,000
2008 AADT from I-84 In OR	39,000
2008 Counts from 401 in Canada	48,000
Theoretical Veh Capacity (1.4 persons)	76,000
TCQSM Seated Capacity on Arterials	139,000
TCQSM Max Capacity on Arterials	209,000
Theoretical Full Veh Capacity (4.2 persons)	227,000
6 Walkers per lane	500,000
Theoretical Bus Capacity (TCQSM)	1,440,000
Theoretical Bus Capacity (OR Buses)	1,980,000

Source: Maximum Theoretical Person Capacity in a 24 Hour Period, Oregon Dept of Transportation⁹.

These theoretical maximums assume a lot of extremes such as the volume of traffic never decreasing, all vehicles moving at constant speed and all vehicles being of the same type and capacity. The observed maximums is considerably lower, in the range of 40,000 to 50,000 people per lane per day, depending on the freeway being observed.

Data Graph from Oregon Department of Transportation Study

Even the theoretical maximum bus capacity, making all the same assumptions, tops out between 1.5 and 2 million people per lane per day. Chicago’s Red and Blue train lines both transport the same amount of people per day on their respective two track systems than the theoretical maximum for a single lane of carpool traffic. Singapore’s North East Line, also utilizing two dedicated tracks, has a higher actual capacity than the maximum theoretical capacity of over two lanes of freeways consisting only of carpool lanes, and six and a half lanes consisting of single capacity vehicles.

Cost

Current estimates put the cost of a Google driverless vehicle at $300,000¹⁰. Obviously the research and development cost of a car isn’t comparable to the retail cost once an item is mass produced, but it does indicate that a lot of money is being put into the technology behind these vehicles. General Motors has invested over a billion into self driving vehicles with the creation of a tech center in Detroit specifically for autonomous car research¹¹. Even the US government is investing over $4 billion of tax payer funds into autonomous car research over the course of the next ten years¹². That’s a lot of tax payer money to spend on research that may or may not lead to viable self driving vehicles, as opposed to spending that money on trains which we currently do know how to build.

The cost of rail based public infrastructure is high, but it’s for solid technology that exists throughout the world. The amount of both public and private funding placed into self driving vehicles and infrastructure takes away from potential funding for fixing traffic congestion in the present. At the very least, public funds shouldn’t be given over to private closed research by technology and automotive industries, but rather be spent on existing transportation infrastructures that has a far greater reaching benefit for all people in a much more achievable time frame.

Software Licensing and Car Ownership

Self driving cars are safest on roads with only other self driving cars. In such a system, you would no longer have a need for stoplights at intersections, except at pedestrian crossings. Cars could effortlessly flow by each other in an intersection, with complex navigation systems communicating velocity and direction instructions seamlessly between each vehicle.

But in such a system, autonomous car streets would need to be separated entirely from non-autonomous vehicle streets. Software improvements would need to be distributed to every vehicle for safety reasons. Security would need to be under continuous audit. Testing would need to be held to the same standards used for medical devices and airplanes. One rogue hacker or one bad software update would at best, cripple transportation and, at worst, kill an unfathomable number of people.

It may even make sense, in such a hypothetical system, to not allow car drivers to own their vehicles, but instead lease them from an organization that would assume the maintenance, as well as the liability, of operations. To allow car ownership would require a governing body that ensures vehicles have all their required updates and haven’t been tampered with. Modifications could potentially be very limited and the issue as to whether drivers truly own these vehicles comes into question. At this point we’ve closed the loop, and now you effectively have a public transportation system (although it could be privately owned, as many public transport services are).

If people were truly allowed to own their self driving vehicles on either autonomous-only or shared roads, there would be huge legal and liability concerns surrounding what one could and couldn’t do to their own property. It would tear into the core of the long standing hardware/software ownership debate in the open source communities.

Furthermore, if every manufacturer had their own self driving algorithms and implementations, would potentially life saving improvements be required to be implemented by other manufactures? Would there be universal test cases that each manufacture must pass for each software iteration (e.g. a miniature DARPA Grand Challenge which every car must pass on every potential software update)? Should all self driving code be required to undergo third party auditing? If When people are to be able to crack the security on the software controlling their vehicles, would they be criminally labial in cases where modified software leads to an accident?

Open Driving

It’s important to mention that there are open source automated driving tools in the wild, most which come with pretty big disclaimers. There’s Autoware, a BSD licensed suite of software which include computer vision, learning, acceleration/breaking/steering and simulation tools¹³. Openpilot is another research project that supports autonomous driving on two real world cars. Both projects have disclaimers about how the software is only for research and that individuals are responsible for complying with local laws¹⁴. Yet video can be found of Openpilot in use on active roadways¹⁵.

Commercial autonomous vehicles use real time operating systems with very low latency. Openpilot is a combination of visiond and python scripts running on an Android cell-phone. The fact that there is video of it being tested on a real roadway should be of great concern to people.

I don’t want to discourage research in the field of computer vision, and these open source projects do provide a lot of knowledge and research that’s accessible to the public. In fact, safety might be improved by regulating that companies do open parts of their software for inspection; allowing independent researchers to establish a baseline of safety and industrial standards for driving algorithms.

Still, students and researchers have been participating in competitions, such as the DARPA Grand Challenge, for years. Competitions allow researchers to experiment on closed tracks in controlled environments, where as these open source projects have researchers testing their devices on highways.

Legal Status

The legal issues surrounding autonomous vehicles are vague, with Uber disregarding the law and running unlicensed autonomous vehicles in production on the streets of San Fransisco¹⁶. Uber has a history of simply defying laws to establish their business in new regions. Where as in the past this has been framed an issue of workers’ rights or unfair monopolies, unregulated self driving cars go well beyond labor laws in their potential for damage.

In the US, laws regarding autonomous vehicles only exist in California, Florida, Nevada, and the District of Columbia. Since other states don’t explicitly address computer driven vehicles, self driving cars may or may not be legal in other jurisdictions. This vague legal gray zone is exactly what Google exploited for their early prototype vehicles¹⁷.

Before we’ll ever see production self driving vehicles in the hands of consumers, there are several legal questions that must be answered. Who is responsible for maintenance and security updates? Will owners be able to modify the software on their cars? And there’s the most important question, who is liable in the case of an accident, or a fatality?

Buses vs Trains

Buses may have more flexible routes, but they have limited capacity and still must share motorways with other vehicles. Some cities have express buses with dedicated lanes and stations. These systems remove the bottlenecks of shared roadways at the expense of flexibility, while maintaining the limited capacity compared to trains.

Even fully electric trolley buses on dedicated express ways still have the energy costs of tires, and pale in both capacity and maintenance costs of equivalent rail systems. Most express bus systems must drop from dedicated to shared lanes within metropolitan areas at some point, making it impossible for them to compete with high capacity rail systems. Some of the world’s more complex light rail system have trains arriving every two minutes or less on each platform during peak hours. This frequency allows them to approach the theoretical maximum carrying capacity of rail networks far more easily than with buses.

Autonomous self-driving buses would only solve the issues of labor. Although they have the potential to increase safety compared to human drivers just like with self-driving cars, their energy costs are still considerably higher than trains. At constant speed on level ground, pulling the same load, any steel wheeled rail engine in motion will only use 5% of the energy required by a tire based road vehicle in motion. Even with starting and initial acceleration, steel against steel vehicles only use 10% of the energy required by any large pneumatic tire road vehicles. What’s even more counter intuitive is that only in the case of railroads, train resistance (rolling resistance) is inversely proportional to the train weight. This mean that the heavier the train, the more energy efficient it becomes¹⁸.

Conclusions

Moving people from point a to point b in America is becoming increasingly difficult. Traffic and gridlock, as result of single passenger per vehicle highway systems, simply not scaling to meet population growth. Where large cities such as Chicago, New York City and Washington, D.C. have met this demand with by maintaining and expanding their networks of light and heavy rail, the majority of American cities have removed their light rail systems to make way for unsustainable car based transport.

Every city I have lived in had streetcars or trams at one point in time. From Chattanooga to Cincinnati, street cars carried large numbers of people for decades. Companies such as General Motors, Firestone Tiers, Ford and others, all pushed for bus based transportation and the removal of light rail¹⁹. The ability for buses to change routes seems like a positive, except when contrasted with their limited carrying capacity and non-dedicated right of way.

Trains can be easily automated, and there are several fully autonomous, safe rail networks around the world. A well constructed rail network does take a considerable about of construction time and funding, but the payoffs are large, scalable transportation systems that can move people at greater efficient, lower cost and lower pollution that cars or buses.

In places like Europe, autonomous vehicles could help bridge gaps in transport in areas that are already serviced by extensive and complex rail networks. However in places like the United States, they will not solve core transportation issues in a truly sustainable way. Even if we could mandate dedicated roadways just for fully autonomous vehicles; road networks without traffic lights and computer navigation based intersection exchanges, the problem will eventually hit a bottleneck due to the limited number of people per vehicle.

The capacity of such a system would still be below that of a rail network, and would essentially create a separate system for those who could afford self-driving cars. Alternatively, self driving vehicles would need to be owned and maintained by a central agency and individual users would rent time on them similar to Taxis, Zipcar, Uber and Lyft.

“Last thing I want to see before I die
Is the flash of twenty two inch chromes in my eyes
In America, in America
They’ll bury us with our cars”
-Burry Me with My Car, Ben Sollee (song)²⁰

Self driving cars are cool. They are the romanticism of the science fiction world we were promised in our books and literature. However as cool as they may seem, they don’t solve fundamental transportation issues in ways that will scale for the future. Large scale human transportation is a solved problem in most of the world, and is continually being improved upon. Instead of continually dumping funding into autonomous vehicles, the United States needs to create solid rail transportation systems, both within cities and linking cities via high speed rail. With America so far behind in mass transit, now would be an excellent opportunity to invested in pure automated rail technology. It would be a substantial leap, moving the US from far behind our western counterparts to leading the world in autonomous train technology.

Preliminary Report, Highway HWY16FH018. 26 July 2016. National Transportation Safety Board. ↩
Now a Third Tesla Crash Is Being Blamed on Autopilot. 11 July 2016. Silvestro. Road and Track. ↩
Germany Says Tesla Should Not Use ‘Autopilot’ in Advertising. 16 October 2016. Reuters. ↩
Uber’s self-driving cars are already getting into scrapes on the streets of Pittsburgh. 4 October 2016. Griswold. ↩
‘Driverless’ Tube trains: See inside TfL’s new fleet for London Underground. 9 October 2014. Eleftheriou-Smith. Independent. ↩
LRT patronage up 10.9% and MRT up 4.2%; bus passenger trips rise 3.7%; cab trips down. 10 March 2016. Tan. Straits Times. ↩
Overview North East Line. Retrieved 18 December 2016. SBT Transit. Archived Version ↩
Annual Ridership Report Calendar Year 2012. 12 April 2013. Chicago Transit Authority. ↩
Maximum Theoretical Person Capacity in a 24 Hour Period. 27 November 2013. Bettinardi and Prusakiewicz. Oregon Dept of Transportation. ↩
Google’s Trillion-Dollar Driverless Car – Part 3: Sooner Than You Think. 30 January 2016. Mui. Forbes. ↩
GM to spend $1 billion on self-driving tech center in Detroit . 24 June 2016. Curry. ReadWrite. ↩
U.S. Proposes Spending $4 Billion on Self-Driving Cars. 14 January 2016. Vlasic. New York Times. ↩
cpfl/autoware. Github. Retrieved 19 December 2016. ↩
commaai/openpilot. Github. Retrieved 18 December 2016. ↩
Hack Autonomous Driving into Your Car with Open Source Hardware Comma Neo and Open Pilot Software. 1 December 2016. CNXSoft. ↩
Uber continues self-driving vehicle testing in SF in defiance of DMV. 16 December 2016. Conger. TechCrunch. ↩
Are Self-Driving Cars Legal?. Retrieved 18 December 2016. HG. ORG. Archived Version ↩
Why Rail Has 20X Energy Saving Advantage Over Rubber Tire Road Vehicles - The Science of Locomotion. Retrieved 19 December 2016. Brooklyn Historic Railway Association. Archived Version ↩
Kennedy, 60 Minutes, and Roger Rabbit:Understanding Conspiracy-Theory Explanations of The Decline of Urban Mass Transit. 17 November 1998. Bianco. ↩
Bury Me With My Car. Retrieved 19 December 2016. Sollee. (Song Lyrics) ↩

When I lived in Australia, sending money to an individual or business was as simple as knowing their Bank State Branch (BSB) number and account number. I could go through a web interface, or a phone app, and send $50 to a friend. It would show up the next morning in their account (or potentially the same day if we used the same bank). This transfer went through a government system known as the Australian Payments Clearing House (APCA), was completely free of fees for individuals and worked with all banks in Australia. Many countries have similar systems, some adding additional security with one-time use Transaction Authentication Numbers (TANs).

Although America has a means of electronic transfers between banks, it’s not available for individual person-to-person transfers. Automated Clearing House (ACH) transfers are only available to certain businesses and the means for verifying identity and exchanging money via it are slow and convoluted. I honestly hadn’t realized how far behind the American banking system was until I spent several years exposed to various foreign banking systems.

While I was away, I had asked my friends, “Does America support direct person to person digital transfers yet?” People would mention that this could be done via PayPal, Square or even Facebook. None of these are direct person to person transfers. Many incur fees and require accounts on privately held systems. Although some of these private systems connect to banks directly via ACH, the verification process is terribly outdated. Many simply use the existing Credit/Debt card networks, incurring and passing on the additional fees to the end users and businesses.

In many countries, peoples’ Tax ID numbers and bank account numbers are not considered secret. In the US, Americans have no national identification numbers or Tax IDs. Instead the Social Security Number (SSN), despite at one time having the words “Not for Identification” written on Socail Security cards, is abused in such a way as to also serve as a means of security verification for banks, government offices, employers, insurance companies, credit reporting agencies and background checks. Security aware Americans often attempt to protect both their SSNs and bank account numbers, as public knowledge of either can lead to identity theft. Even though ACH transactions are audited, run in batches and traceable, the secrecy around bank account information prevents any type of implementation of electronic person-to-person transfers.

Furthermore, when a private entity such as PayPal wants to interact directly with a bank account, they often require an authorization process where they deposit two small amounts (typically only a few cents) into a bank account provided by user. The deposit can take anywhere from one to three days to appear, at which time the account is verified by entering those amounts into PayPal’s web interface. This is entirely backwards from other countries, where direct transfers to businesses and individuals can all be done entirely through one’s home banking website or mobile application.

In America, direct person-to-person transfers still require a check. The physical check doesn’t even need to be deposited into a bank. Thanks to the Check-21 Act passed in 2003, an image of the front and back of check is just as legally valid as the check itself. Many banks have cellphone applications which can be used to photograph a check to initiate the transfer (the original check should then be voided or destroyed).

A physical check contains considerably more information than electronic transfers. When a person issues someone a check, it often has the bank routing number, their account number, their full name and address (typically used for debt collection purposes when bad checks are issued). The other countries I’ve lived in do not have check photo scanning built into their mobile applications because, as I’ve stated, they have direct person-to-person electronic transfers. They don’t need to resort to outdated mechanisms such as physically signing and photographing a check.

Taxes

The Australian government provides free, supported tax software for its residents and citizens. When I used it back in 2013, it wasn’t the prettiest interface, but it was very usable and straight forward. In New Zealand, people don’t file taxes. Everyones’ income information is collected by the Inland Revenue Department (IRD), and taxes are automatically calcualted and withheld. At the end of a tax year, residents can log into the official website and review their information for mistakes. If there is a refund due, it can be transferred to a bank account from the same web interface. Many European nations have similar systems.

In comparison, the US tax process is overly complex and convoluted. The US Internal Revenue Service (IRS) does allow people to file for free by using paper forms, but they provide no official government software. Instead, companies like Intuit — makes of TurboTax — and H&R Block have collectively spent over $5 million to lobby against better solutions. They pushed for laws with deceptive names such as the Free File Act of 2016, which claims to allow lower income families to e-file for free. However, it blocks the IRS from creating their own public software. Furthermore, few of the people eligible for the no-charge filing programs even use it, as the system is confusing and pushes people towards the paid options instead¹.

Living in the 90s

Dealing with money electronically in America is archaic. Many banks have on-line bill payment systems. Although they may work out deals with larger utilities for electronic payment, for things like rent paid to individuals, these on-line systems often print and mail paper checks to the payee, which are then either electronically or physically deposited in a bank. Some banks allow personal transfers by e-mail or SMS/text, but this typically requires the recipitant to create an account on website for the sender’s bank and then input details to complete an ACH transfer. Only some banks support this and it’s far from universal.

I’m not sure if we’ll ever see a more robust banking system in the United States. Payment gateways are dependent on taking percentages from large numbers of transactions in order to grow their profit margins. The Federal Reserve controls the ACH standard, and there is no incentive for them or banks to modernize their systems and allow for direct person-to-person payments. With the current influence of financial industry lobbyists, it’s unlikely we’ll see legislation to mandate such changes any time soon.

Filing Taxes Could Be Free and Simple. But H&R Block and Intuit Are Still Lobbying Against It.. 20 March 2017. Huseman. ProPublica. ↩

In recent months, there has been considerable debate by videos creators on YouTube about the future of generating revenue through Google’s extremely popular streaming video service. Advertisers are backing away from more controversial content and YouTube has begun to demonetize several types of videos. Services like Flattr, which attempted to allow content consumers to fill the tip jars of creators, have slowly fizzled. Yet out of those ashes has come a new service known as Patreon, which allows fans to directly contribute to the continual production of many varieties of art.

Ad keywords on YouTube are priced, matched and added to content in a sophisticated yet closed, proprietary set of real time services. This makes it nearly impossible to get accurate numbers on how much content creators make per view, as it literally varies for everyone¹. Furthermore, YouTube is a walled garden, meaning that although it’s a very powerful free service, it’s completely controlled by Google and their policies. In recent years YouTube has demonetized several types of videos, include those that deal with political content or issues that may be considered controversial to members of their advertising base.

“…On TV they typically avoid going too far down certain paths, for the most part, because advertisers don’t like offensive stuff … YouTube beckoned us all in with this motorization, but now, in the past year, they’ve changed their minds and have started imposing heavy restrictions on themes and content. Guess who’s calling this one? It’s those lovely advert coin-men in their lovely lovely suits. For many creators this is our job and many of us would actually have to simply adhere to the rules and makes some changes to continue … now that’s a different approach to censorship. Don’t use force, just simply take the coins away. Now I could bland it up and start making middle of the road nice stuff, but who wants that? Nobody! .. Now YouTube can do what the hell they want. It’s their platform. If they think a green man’s nipple on the least realistic looking cartoon ever drawn squirting milk is bad for business, then I’m in the wrong place. Advertising is dog shit anyway. I don’t like all that subliminal suggesting; it’s poison…” -David Firth²

Flattr is a service that was based out of Sweden and launched in 2010. It allows users to set aside a certain amount of funds to tip content creators monthly. They initially integrated with APIs from services like YouTube and Twitter so that donations could be automatic for simply liking or giving a thumbs up to content. However most of these hooks have been removed as Twitter and YouTube both disallowed this type of tracking via their official API.

Flattr is limited by the fact that it only provides a virtual tip jar, often to give funds to a specific video, tweet, photo or blog post. What Patreon does is provide a system for fans to continually subscribe directly to contributors. It provides either a constant source of monthly revenue, or creators can ask supporters to have contributions made per work produced (e.g per blog post or video). It cuts through the current debate about the ethics of ad blocking software by removing advertising from the equation. Excluding payment gateway and transfer fees, they take 5% of successful contributions to fund their service³.

For many writers and artists who are just starting out or have low readership (like myself), Pateron can be similar to Flattr, acting as a tip jar and providing a few extra dollars each month, for fans to show their appreciation. For those who are more popular and have been creating content for years, it can provide a direct way for fans to fill in the financial gaps created by YouTube over the past few years.

Walled Gardens and Profitability

Back when YouTube started in 2005, it was not the only player in the video hosting game. One of the likely factors in YouTube surviving, where its rivals stumbled, was their scalability engineering. Their original stack included Python, MySQL (with sharding) and Lighttpd. Their founding staff would spend each day identifying performance bottlenecks, fixing those bottlenecks, drinking and (maybe) sleeping. Engineering manger Cuong Do explained in a 2007 talk on scalability, that this process continued for months on end in YouTube’s early days⁴. Serving video efficiently is by no means a trivial operation. What made YouTube so successful, leading to its eventual buyout by Google, was not that it had zero downtime, but simply that it didn’t crash as often, and had less downtime than its competitors.

There isn’t a lot of information to be found on YouTube’s current technology stack. It’s most likely proprietary and held in secret by Google. In 2012, Google claimed that their service served 4 billion videos daily, yet most of those videos didn’t generate any revenue. Only 3 billion videos per week, roughly 10%, were monetized by advertising⁵. Even by 2014 with posted revenues of over $4 billion, YouTube revenues didn’t contribute to Google’s earnings as the company claimed they were still “roughly break[ing]-even⁶.” Even by 2016, Google still refused to release revenue numbers specific to YouTube. Susan Wojcicki, CEO of YouTube, mentioned how YouTube was still in “investment mode⁷.”

Ad Revenue Sustainability

YouTube is in a very interesting position, where it ingests nearly 65 years of content per day¹. In their earlier days, normal accounts were restricted to only 15 minutes video clips (which is why many older videos are multi-part 15 minute segments combined via a playlist). However, that restriction has been lifted for years. Rivals like Vimeo offered longer clips, higher bitrates for High Definition (HD) content, and paid accounts for greater upload capacity. YouTube has caught up in recent years with not only HD content, but 4k and 3D video hosting as well. Another thing which put YouTube ahead of Vimeo was its related video section. Combined with massive amounts of new daily content, the related list allowed for users to quickly move from one video to the next. Although Vimeo added their own version of this feature later, YouTube provided a means for new videos to be recognized and accessible.

Services like Patreon now provide a direct means of revenue, meaning that creators that have content which may fit perfectly into YouTube’s revenue model, may opt to not use YouTube’s ad placement system. Some creators have direct sponsorship and place ads directly into their videos or mention products at the beginning or end of their clips. Other creators use a combination of user donations and in-video ads, but the point is that in all of these cases, YouTube is now essentially hosting peoples’ content for free. Those creators also have less of an incentive to use YouTube’s monetization system at all.

With the utterly massive amount of new content ingested and videos streamed to viewers every day, one has to ask the question: is YouTube’s business model sustainable? It’s unlikely we’ll see a future where Google limits the amount of uploads or starts charging extra for higher levels of video quality. If the dot-com bust of 2001 has taught us anything, it’s that charging a fee for features that were initially free is a great way to kill your business. With YouTube being such an iconic name and brand in the minds of people all over this planet, any types of changes aimed at long-term sustainability or profitability must be made carefully.

Disruptive Services

YouTube’s current shortfall seems to be with sticking to the traditional business model around funding entertainment with ad revenue. Apple iTunes, Google Play and Amazon Music all take a 33% cut of sales, similar to the traditional CD industry that proceeded them. Bandcamp disrupted this system by only charging 15% for the same services (10% for high revenue earners who hit $5,000 USD), while also providing higher quality lossless audio formats.

Patreon’s service is similarly disruptive and fulfills a deep need for artists and writers. Their minimal overhead costs make it seem, at least on the surface, as a business geared towards helping independent creators, more than their own bottom line. Patreon offers some limited posting abilities, but it’s not designed to host content directly. Instead posts, both public and exclusive to patrons, often link to other hosting services like YouTube, BandCamp, SoundCloud or independent blogs and websites. By not relying on advertising, creators now have a direct link with their audience. By not trying to host content directly, Patreon gives creators the choice to use whichever services they prefer.

At its core, Patreon is a very simple concept. It combines the somewhat feature-lacking concept of PayPal Subscriptions and wraps it in a very basic posting system, which restricts access by subscriber tiers, while providing goals and rewards. It’s like a Kickstarter or Indiegogo for long-term, sustained work, instead of one-off projects. Their website does help with discovering other projects and creators which may match one’s interests, similar to related videos on YouTube or the project listings on Kickstarter. However there are many open source alternatives to Patreon as well, such as SnowDrift, Liberapay, Salt and Gratipay. Such projects are useful for people who either want to self host their funding campaigns or use systems specifically designed for open source software.

We are likely to see other services like Patreon in the future; potentially targeted at specific types of content and creators. The thing that has really caused Patreon to grow recently, besides YouTube’s controversial changes, is that it doesn’t strongly couple individuals to delivery services. If YouTube decides to start deleting old content or changing its video policies, a filmmaker with a substantial following on Patreon can simply move to a different platform. If Deviant Art decides to close its doors or Yahoo decides to close down Flickr, photographers with followings on Patreon can use the platform to inform their fans they’re moving their artwork elsewhere. It creates a direct conduit between creators and their fans, and allows funding from just a few people to support an artist being able to provide their work, potentially advertising free, to everyone.

This Video Made $2,573 at Auction. How Ads Work on YouTube. (Post-Adpocalypse Updated Estimate). 5 Apr 2017. CGP Greg. (Video) ↩ ↩²
Not Relying On Advertising. 12 April 2017. David Firth (Video) ↩
How do you calculate fees?. Patreon. (Retrieved 23 May 2017) ↩
Seattle Conference on Scalability: YouTube Scalability. 23 June 2007. Google Tech Talks. (Video) ↩
Exclusive: YouTube hits 4 billion daily video views. 23 Jan 2012. Oreskovic. Reuters. ↩
YouTube: 1 Billion Viewers, No Profit. 25 Feb 2015. Winkler. Wall Street Journal. Archived Version ↩
YouTube CEO Says There’s ‘No Timetable’ For Profitability. 18 Oct 2016. Rao. Fortune. ↩

I used to work at the University of Cincinnati and whenever I got frustrated at staff meetings, I’d threaten to move to Australia. After a $300 application fee and a surprisingly short approval process, I had holiday work visa which allowed me to live and work in Australia for a full year. My manager led me to our director’s office. With my resignation letter on his desk, my director simply asked, “Do you want more money?” to which I responded, “I’m moving to Australia.” There were confused looks from the two of them, awkward silence and finally, “No, really … I’m moving to Australia.” It was the first time I had left the relative security of a full-time position, and it wouldn’t be the last.

When we’re children, we often dream of all the possibilities of what we can be. I was a pretty boring kid. I recall an assignment in the 2nd grade where I had to draw what I wanted to be when I grew up. A crude drawing of a power plant’s cooling towers filled the page, emulating my father who was an electrical engineer at a nuclear site. Later I would dream of being a police officer, a video game creator, a writer, a pilot and countless other potential futures as all children often do.

As I grew up and prepared to graduate high school and start university, I had gained a reputation of being a computer nerd. I was one of a handful of people in my high school graduating class that understood IP addresses, how domain controllers worked and realized that Y2K was not going to be that big a deal. My cousin was a computer scientist, and that seemed to be the path I was about to embark on. I took a part-time job in high school working as a web developer for a local dotcom company designing HTML templates and creating graphics in an archaic program known as LView Pro.

The long hours and poor work conditions took their toll and I eventually left the company. I wasn’t sure if software engineering was a field I really wanted to pursue. I remember my father once told me that he wanted to go to art school when he was younger, but he chose engineering instead. When I told my sister I was considering journalism, she told me I needed to think of my future family (I’m currently 35 and have no children). My University removed the requirement to have a minor from our program, so I dropped courses that would have at least given me a minor in English Journalism. I graduated in 2004 with a Bachelors of Science in Computer Science.

“I said, ‘I’d like to be a writer.’ And they said, ‘Choose something realistic.’ So I said, ‘Professional wrestler.’ And they said, ‘Don’t be stupid.’ See, they asked me what I wanted to be, then told me what not to be. And I wasn’t the only one. We were being told that we somehow must become what we are not, sacrificing what we are to inherit the masquerade of what we will be. I was being told to accept the identity that others will give me. And I wondered, what made my dreams so easy to dismiss?” -Shane Koyczan¹

When work in Information Technology (IT) failed to satisfy me, I began to seek ways out of the monotony. Changing positions and living in different countries helped keep my mind stimulated, both by learning about new fields, taking on new challenges in my career, and through the external influences of living in new places. I’ve always considered the standard of my work to be fairly high, as companies that hire me for contracts often offer me full-time positions. In Australia, I worked two contracts, one as a developer and another as a system administrator, both for a travel agency. I was asked to come on full time, at the end of my second contract, and was told the company would even sponsor a new work visa. I turned down the position, destined for New Zealand with another holiday work visa and the desire to explore.

Due to an unfortunate event, I made the decision to once again leave a full-time position. However this time there was no real exit strategy. I took off across New Zealand and Australia, volunteering to perform at spoken word poetry nights, and continued to travel, living out of two bags for the next ten months. After running low on funds, I ended up in Seattle, taking a contract as a Scala developer. Although it was a contract position, it was pretty much full-time work for fifteen months. At some point I realized I had gotten too complacent, and accumulated more than I would like, being a minimalist. I set out once again. This time I would be driving across the US, without an absolute destination in mind.

The money I’ve spend on sabbaticals could have easily paid for over half the cost of a house. The American dream, to the dissatisfaction of some of my family members (who I realize only want the best for me), has been something I have mostly cast off. We live in a society where we have come to accept that high income careers are often coupled with demoralizing work conditions. We poke fun at this reality with comics such as Dilbert, or its lesser known and darker cousin We The Robots. Movies like Office Space and shows like The Office let us laugh at the absurdity of the jobs we take to maintain a lifestyle that so many people in our society are unfulfilled by.

“American poet Walt Whitman gave our multiplicity memorable expression. ‘I am large. I contain multitudes,’ by which Whitman meant that there [are] so many interesting attractive and viable versions of oneself; so many good ways one could potentially live and work but very few of these ever get properly played out … The Scottish philosopher Adam Smith, in the Wealth of Nations … explained how what he termed the division of labour, massively increases collective productivity. In a society where everyone does everything only a small number of shoes houses, nails, bushels of wheat … [are] ever produced and no one is especially good at anything. But if people specialize in just one small area—making rivets, shaping spokes, manufacturing rope, bricklaying, etc. They become … much faster and more efficient in their work and collectively the level of production is greatly increased. By focusing our efforts we lose out on the enjoyment of multiplicity yet our society becomes overall far wealthier and better supplied with the goods it needs … in other words tiny cogs in a giant efficient machine, hugely richer but full of private longings …“ -Why is Work so Boring, The School of Life²

Despite the struggle I, and many other adults, have with finding fulfillment in being cogs in the machine of western society, I acknowledge the benefits of my career choice. I can work in other countries without specific certifications (such as are required by doctors, legal representatives and other specializations). My specific career allows for short-term contracts. My background and experience has built my level of skill to be desirable and, therefore, somewhat lucrative. With software engineering penetrating almost every market and aspect of society, there is an endless number of fields to work in and learn about. Although I specialize in software engineering and system administration, my lack of job loyalty means I have been able to work and learn about everything from health care to debt collection, foreign governments to telecommunications, and e-commerce to identity management.

Travis and Tatyana's Recreational Vehicle

Despite the droning monotony of office work, there are a lot of advantages to the IT field. While I personally may not have a family to take care of, I have two sets of friends who decided to embrace minimalism, scale down to living in recreational vehicles and work remotely. They travel with their children, with the goal of settling once their children are old enough to start school. Another friend of mine in New Zealand simply reduced to two hours of work a week for a few months, took a considerable amount of unpaid leave, traveled for months throughout Europe, and eventually returned to his job refreshed and ready to tackle new problems.

Despite all this flexibility, or maybe thanks to it, I have embraced simply leaving full-time work. Casting off that security is not something which is easily done in American culture. The notion of work is embedded in the fabric of American mythos. Our European neighbors often take time away from careers to reinvest in their well being, and have both the vacation time and social infrastructure for this to be possible. In America, we don’t even have a means of providing decent healthcare until one is old enough to qualify for Medicare, and therefore worn out their usefulness as a member of the workforce.

Leaving the workforce is not an easy endeavor. With jobs moving overseas and positions disappearing due to automation, many Americans struggle to simply feed and house themselves from month to month. Even those who are well off and make considerable amounts of money, often feel trapped by the social mores that lead us to believe everyone must work and invest their effort for the greater good of others and society. For example, leaving work to attempt to found a startup is considered more respectable, even if one fails, than simply leaving for the sake of physical or mental health.

If one takes a step back to question why we continue to struggle, we might find ourselves in the grasps of absurdism. We use our careers to add value to a life that struggles to find meaning against the universe. We climb up the corporate ladder, bamboozled into some racket where we are selling insurance, only to wake up forty years later to find the bucket we’ve invested all our self-worth into has no bottom. Be careful not to fall into the spiral of regret. Your life is what it is. No amount of wishful thinking allows us to turn back time.

Still, we don’t have to continue down the paths we have been traveling for years or decades, convinced that is the only thing we know and are and will ever be. We are not trapped into terrible interview questions about our five year plans, as if we know which plane we’ll one day board or where it will eventually land. If we step back from the life we often feel has been handed to us, we can often see our choices, both those that led us to our current state and those which can move us forward in our future.

We can ask ourselves the hard questions, and see beyond the veil to realize we are the ones who are really controlling the machine. If we are content with the meaning we give to our lives; our careers, families, children and loved ones, then there is nothing wrong with that. We can reassure the people around us they they matter to our lives, and all that we do is to spend time on the important bonds and friendships we build. However, if we find ourselves in despair, struggling to find meaning and purpose, we should not be afraid to seek change, while keeping our responsibilities, and change the parts of our lives which are currently left unfulfilled.

“I stopped in the middle of that building and I saw—the sky. I saw the things that I love in this world. The work and the food and time to sit and smoke. And I looked at the pen and said to myself, what the hell am I grabbing this for? Why am I trying to become what I don’t want to be? What am I doing in an office, making a contemptuous, begging fool of myself, when all I want is out there, waiting for me the minute I say I know who I am! Why can’t I say that, Willy?” -Death of a Salesman

Shane Koyczan: To This Day … for the bullied and beautiful. Feb 2013. Shane Koyczan. TED. ↩
Why is Work so Boring?. 22 March 2017. The School of Life. (Video) ↩

Back in 2013, a startup known as Cloud at Cost attempted to run a hosting service where users paid a one-time cost for Virtual Machines (VMs). For a one-time fee, you could get a server for life. I had purchased one of these VMs, intending to use it as a status page. However, their service has been so unreliable that it’s a shot in the dark as to whether a purchased VM will be available from week to week. Recent changes to their service policy are attempting to recoup their losses through a $9 per year service fee. It’s a poor attempt to salvage a bad business model from a terrible hosting provider.

It seemed like a novel concept at the time. Instead of charging people a recurring charge for hosted services, charge them one fee for life. However, there’s a reason hosting providers charge by the month. Providing reliable data centers, servers and customer service is an expensive endeavor. In my original review of Cloud at Cost, I mentioned how slow their early support/ticket system was and how I had to fix several of my own issues with the VMs.

Over the years I’ve occasionally attempted to use my Cloud at Cost VM for various tasks. In April of 2014 I couldn’t access my server, and it turned out it was shut down. Cloud at Cost had changed many of their servers to automatically power off by default, most likely in an attempt to free resources from people who weren’t actively using their servers. People who were using these VMs for hosting would have had their servers shutdown without notice, forcing them to either open tickets or use the confusing control panel to attempt to restart it and disable the auto-shutdown.

In January of 2017 I attempted to login to my VM again, only to find it once again shut down. Logging into the control panel produced errors every time I tried to restart it. It took half a month to get an answer to my support ticket, at which time I was told the VM couldn’t be recovered and I’d have to rebuild it.

Finally in the summer of 2017, I started getting e-mails about a suspension warning. A $9 per year fee had been added to Cloud at Cost accounts, thereby breaking their one-time fee for life model.

24 Hours Server Suspension Warning E-mails from Cloud at Cost

My VM was no longer responding. When I logged into the control panel, I suddenly got a new terms of service agreement.

It’s important to note this particular part of the agreement:

THESE TERMS AND CONDITIONS MAY BE UPDATED AND CHANGED WITHOUT NOTICE TO THE CUSTOMER.¹

I suspect a similar clause was in the original agreement. If not, Cloud at Cost could face legal trouble from their customers. However, I find it highly doubtful anyone put enough trust in Cloud at Cost to host significant infrastructure with them, which would make such legal action not financially worthwhile.

For new customers, this $9/year cost is not even presented anywhere in the checkout process. It is buried down in section nine, in a link to their terms of service:

9.18 Customers with a onetime payment service is subject to an annual maintenance fee of $9 which will be invoiced 12 months after using our service¹.

I shouldn’t be surprised Cloud at Cost is still in business. A few weeks ago I did some work for someone who was using a terrible web hosting provider I gave up on years ago. With these types of low-cost hosting services, the cliché “You get what you pay for” often holds true.

In my original review, I noted that it’d take approximately three years for the one-time cost to equal that of the monthly hosting fees. I made the argument that people would essentially be betting the company would be around that long. However, with Cloud at Cost randomly shutting down servers, randomly deleting servers, taking forever to answer tickets and a host of other problems, I highly doubt any of their customers have gotten anywhere near three years of use out of their products. Searching for reviews on Cloud at Costs produces nothing but posts by angry customers and tweets claiming the service is a scam.

The recent barrage of e-mails claiming 24 hour server suspension seems to be a last ditch effort to raise capital. They are realizing the one-time fee model for cloud computing simply doesn’t work, or if it does, their particular implementation is terrible and unsustainable. Every infrastructure problem they’ve had over the past few years feels as if it’s an attempt to free resources by shutting down unused services, or con customers into providing them additional revenue.

Cloud at Cost was an interesting concept when it came out, and many people flocked to their low cost plans just to have a spare box to run things on. I doubt we will ever see a postmortem on their company when it does eventually fail. I suspect there were many over-generous cost estimates early on in the company’s life. Their control panel, web interface and ticketing system is terrible, and hasn’t improved significantly in the life of their company. They may simply lack the caliber of developers and system engineers to work on improvements, or only generate enough revenue to keep what currently exists running. I’m also curious if the founders and managers in the company honestly thought they’d be able to provide these services indefinitely with the one-time fee model, or if they knew and planned for shutting down unused instances and other shady tactics that have earned them their current reputation.

Is it possible to offer a one-time fee based service system that is truly sustainable? To do so, a company would have to plan for infinite growth, either current customers continually adding to their purchases or a continual flow of new customers. There are unavoidable reoccurring costs when it comes to electricity, personnel and hardware. Old hardware often has to be upgraded or replaced after several years, and existing custom software needs to be continually improved upon in order to attract a fresh customer base. It may be possible to create a business based on one-time fee cloud computing, but it would require high initial fees, very careful planning and a top of the line VM provisioning system; things Cloud at Cost has never had, and most likely never will.

Cloud at Cost Terms and Conditions. Retrieved on 26 May 2017. Mirror ↩ ↩²

No one enjoys changing hosting providers. I haven’t had to often, but when I have, it involved manual configuration and copying files. As I’m looking to deploy some new projects, I’m attempting to automate the provisioning process, using hosting providers with Application Programming Interfaces (APIs) to automatically create virtual machines and run Ansible playbooks on those machines. My first attempt involved installing DC/OS on DigitalOcean which met with mixed results.

In this post, I’ll be examining Bee2, a simple framework I built in Ruby. Although the framework is designed to be expandable to different providers, initially I’ll be implementing a provisioner for Vultr, a new hosting provider that seems to be competing directly with DigitalOcean. While their prices and flexibility seem better than DigitalOcean’s, their APIs are a mess of missing functions, poll/waiting and interesting bugs.

Writing a Provisioning System

When working on the open source project BigSense, I created an environment configuration tool called vSense that setup the appropriate Vagrant files to be used both in development (using VirtualBox as the provider) and for production (using libvirt/KVM as the provider). Vagrant isn’t really intended for production provisioning. While newer versions of Vagrant remove the shared insecure SSH keys in the provisioning process, for vSense I had Ansible tasks that would ensure default keys were removed and new root passwords were auto-generated and encrypted.

Terraform is another open source tool from the makers of Vagrant. On its surface, it seems like a utility designed to provision servers on a variety of hosting companies. It supports quite a few providers, but the only Vultr plugin available at the time of writing is the terraform-provider-vultr by rgl. The plugin is unmaintained, but there are several forks, at least one of which is attempting to make it into the official tree¹.

Rather than wrestle with an in-development Terraform plugin, I instead decided to write Bee2, my own Ruby provisioning scripts using an unofficial Vultr Ruby API by Tolbkni. Based on some previous attempts on writing a provisioning system, I attempted to keep everything as modular as possible so I could extend Bee2 to be used with other hosting providers in the future. Once servers have been provisioned, it can also run Ansible playbooks to apply a configuration for each individual machine.

Vultr API Oddities

I ran into a couple of issues with the Vultr API, which I attempted to work around as best as I could. There seem to be a lot of missing properties and poorly engineered API combinations required for basic system configuration function. In this post, I’ll examine the following issues:

The SUBID, used to uniquely identify all Vultr resources (except for those that it doesn’t, such as SSHKEY)
Vultr allows the provisioning of permanent static IPv4 addresses that can be attached and detached to servers, but for IPv6, it only reserves an entire /64 subnet and assigns a seemingly random IP address from that subnet upon attaching to a server.
Reserved IPv4 addresses can be setup when creating a server, but IPv6 addresses must be attached after a server is created with IPv6 enabled.
Enabling IPv6 support on a server assigns it an auto-generated IPv6 address that cannot be removed via the API.
Occasionally, attaching an IPv6 address requires a server reboot.
Private IPs are automatically generated, but they are not auto-configured on the server itself. They are essentially a totally worthless part of the API.
Duplicate SSH keys can be created (neither names nor keys seem to be unique).

Bee2: The Framework

I started with a basic Vultr provisioning class, with a provision method that completes all the basic provisioning tasks. In the following example, we see all the clearly labeled steps needed to provision a basic infrastructure: installing SSH keys, reserving and saving static IP addresses, deleting servers (if doing a full rebuild), creating servers, updating DNS records and writing an inventory file for Ansible.

class VultrProvisioner
  ...
  def provision(rebuild = false)
    ensure_ssh_keys
    reserve_ips
    populate_ips
    if rebuild
      @log.info('Rebuilding Servers')
      delete_servers
    end
    ensure_servers
    update_dns
    cleanup_dns
    write_inventory
  end
  ...
end

The unofficial Vultr Ruby library I’m using is a very thin wrapper around the Vultr REST API. All of the Ruby library’s functions return a hash with :status and :result keys that contain the HTTP status code and JSON return payload respectively. There is a spelling mistake, as the Ruby library has a RevervedIP function for the Vultr ReservedIP call. The API key is global instead of a class variable, and all the functions are static, meaning only one Vultr account/API token can be used at a time.

Overall, the library seems simple enough that I probably should have just implemented it myself. Instead, I created two wrapper functions to use around all Vultr:: calls. The first, v(cmd) will either return the :result, or bail out and exit if the :status is anything other than 200. The second function, vv(cmd, error_code, ok_lambda, err_lambda), will either run the ok_lambda function or run the err_lambda if the specified error_code is returned. v() and vv() can be chained together to deal with creating resources and avoiding duplicate resources.

private def v(cmd)
  if cmd[:status] != 200
    @log.fatal('Error Executing Vultr Command. Aborting...')
    @log.fatal(cmd)
    exit(2)
  else
    return cmd[:result]
  end
end

private def vv(cmd, error_code, ok_lambda, err_lambda)
  case cmd[:status]
  when error_code
    err_lambda.()
  when 200
    ok_lambda.()
  else
    @log.fatal('Error Executing Vultr Command. Aborting...')
    @log.fatal(cmd)
    exit(2)
  end
end

In addition, many of the API calls are asynchronous and return immediately. Commands requiring resources to be available will not block and wait, but outright fail. Therefore we need a wait function to poll, and ensure previous commands have been completed successfully. The following function is fairly robust, and can poll to ensure a certain field is set to a specific value or wait for a certain value to change/not be present.

def wait_server(server, field, field_value, field_state = true)
  while true
    current_servers = v(Vultr::Server.list).map { |k,v|
      if v['label'] == server
        if (field_state and v[field] != field_value) or (!field_state and v[field] == field_value)
          verb = field_state ? 'have' : 'change from'
          @log.info("Waiting on #{server} to #{verb} #{field} #{field_value}. Current state: #{v[field]}")
          sleep(5)
        else
          @log.info("Complete. Server: #{server} / #{field} => #{field_value}")
          return true
        end
      end
    }
  end
end

Configuration

Configuration is done using a single YAML file. For now, the only provisioner supported is Vultr and it takes an API token, a region code, a state file (which will be generated if it doesn’t exist) and SSH keys (which do need to exist; they will not be auto-generated).

The inventory section indicates the names of the files which will be created for and used by Ansible for configuration management. One contains the publicly accessible IP addresses and the other containing private IP addresses. The public inventory will be used to bootstrap the configuration process, establishing an OpenVPN server and setting up firewall rules to block off SSH ports on the public IP addresses. Once a VPN connection is established, further provisioning can be done via the private inventory.

Each server in the servers section requires a numerical plan ID and os ID. A list can be retrieved using the Vultr::Plans.list and Vultr::OS.list respectively. An IPv4 address and a /64 IPv6 subnet will be reserved and assigned to each server. DNS records will be automatically created for both the public and private IP addresses in their respective sections. Additionally, any DNS entries listed in web will have A/AAAA records created for both the domain name and the www subdomain for its respective base record.

Finally, a playbook can be specified for configuration management via Ansible. All of the playbooks should exist in the ansible sub directory.

provisioner:
  type: vultr
  token: InsertValidAPIKeyHere
  region: LAX
  state-file: vultr-state.yml
  ssh_key:
    public: vultr-key.pub
    private: vultr-key
inventory:
  public: vultr.pub.inv
  private: vultr.pri.inv
servers:
  web1:
    plan: 202 # 2048 MB RAM,40 GB SSD,2.00 TB BW
    os: 241 # Ubuntu 17.04 x64
    private_ip: 192.168.150.10
    dns:
      public:
        - web1.example.com
      private:
        - web1.example.net
      web:
        - penguindreams.org
        - khanism.org
    playbook: ubuntu-playbook.yml
  vpn:
    plan: 201 # 1024 MB RAM,25 GB SSD,1.00 TB BW
    os: 230 # FreeBSD 11 x64
    private_ip: 192.168.150.20
    dns:
      public:
        - vpn.example.com
      private:
        - vpn.example.net
    playbook: freebsd-playbook.yml

The Vultr Provisioner

Within the Vultr API, everything has a SUBID. These are unique identifiers for servers, reserved IP subnets, block storage, backups and pretty much everything except SSH keys. Often times the API requires a SUBID to attach one resource to another, sometimes requiring additional lookups. Some functions with the Vultr API have duplicate checking and will error out when trying to create duplicate resources. Other parts of the API require you to iterate over current resources to prevent creating duplicates. Some functions validate that all parameters have acceptable values, while others will fail silently.

SSH Keys

There is no uniqueness checking within the Vultr API for SSH keys. You can create multiple keys with the same name, the same key or the same combination of the two. Within the Bee2 framework, I use the name as the unique identifier. SSH_KEY_ID is a constant defined to be b2-provisioner. The following function ensures this key is only created once.

def ensure_ssh_keys
  key_list = v(Vultr::SSHKey.list).find { |k,v| v['name'] == SSH_KEY_ID }
  if key_list.nil? or not key_list.any?
    @log.info("Adding SSH Key #{SSH_KEY_ID}")
    @state['ssh_key_id'] = v(Vultr::SSHKey.create({'name' => SSH_KEY_ID, 'ssh_key' =>@ssh_key}))['SSHKEYID']
    save_state
  end
end

Private IPs

When creating a server instance, one of the things the Vultr API returns, if private networking is enabled, is a private IP address. I was puzzled as to why you couldn’t specify your own private IP address in the create method, until I realized this address is not actually assigned to your VM. It’s simply a randomly generated IP address within a private subnet that is a suggestion. The official API documentation indicates this addresss still has to be assigned manually to the internal network adapter. Originally I had the following to save the generated private IP addresses:

# Save auto-generated private IP addresses
  v(Vultr::Server.list).each { |k,v|
    if v['label'] == server
      @state['servers'][server]['private_ip'] = {}
      @state['servers'][server]['private_ip']['addr'] = v['internal_ip']
      @log.info("#{server}'s private IP is #{v['internal_ip']}'")
    end
  }
  save_state

I removed this code and instead decided to specify the private IP addresses and subnet in the settings.yml. It makes sense for the API to allow private networking to be enabled, which provides a second virtual network adapter inside the VM. However, randomly generating a private IP address seems worthless, and moves something that should happen in the provisiong phase down into a configuration management layer.

Private IP via Ansible Configuration Management

For Private IPs in Bee2, I’ve created an Ansible role to support IP assignment for both Ubuntu and FreeBSD.

---
  - set_fact: private_ip="{{ servers[ansible_hostname].private_ip }}"
  - block:
      - set_fact: private_eth=ens7
      - include: ubuntu.yml
    when: ansible_distribution in [ 'Debian', 'Ubuntu' ]
  - block:
      - set_fact: private_eth=vtnet1
      - include: freebsd.yml
    when: ansible_distribution == 'FreeBSD'

For Ubuntu, we rely on /etc/network/interface to configure the private interface. We’re relying on the fact that Vultr always creates the private interface as ens7, defined in the facts above.

---
  - blockinfile:
      path: /etc/network/interfaces
      block: |
        auto {{ private_eth }}
        iface {{ private_eth }} inet static
          address {{ private_ip }}
          netmask 255.255.255.0
    notify: restart networking

On FreeBSD, network adapters are setup in /etc/rc.conf and Vultr always assigns the private adater as vtnet1.

---
  - name: Setup Private Network
    lineinfile: >
      dest=/etc/rc.conf state=present regexp='^ifconfig_{{ private_eth }}.*'
      line='ifconfig_{{ private_eth }}="inet {{ private_ip }} netmask 255.255.255.0"'
    notify: restart netif

We’ll need playbooks that reference this private-net Ansible role. Ubuntu 17 only comes with Python3 by default and FreeBSD places the Python interpreter within /usr/local/, so we need to configure the interpreter for both operating systems. For Ubuntu machines, we’ll create ubuntu-playbook.yml which is referenced in the configuration file.

---
- hosts: all
  vars:
    ansible_python_interpreter: /usr/bin/python3
  vars_files:
    - ../{{ config_file }}
  roles:
   - private-net

The following is the freebsd-playbook.yml for our FreeBSD instance:

---
- hosts: all
  vars_files:
    - ../{{ config_file }}
  vars:
    - ansible_python_interpreter: /usr/local/bin/python
  roles:
    - private-net

IPv6

The server/create function allows for attaching a reserved IPv4 address to a virtual machine via the reserved_ip_v4 parameter. However there is no reserved_ip_v6 parameter. When creating the machine, the enable_ipv6 parameter must be set to yes (not true, as I discovered the hard way since the Vultr API doesn’t validate this parameter and will not return an error) and a random IPv6 address will then be assigned to the machine. I contacted support and learned this address cannot be deleted from the machine via the API. Furthermore when attaching the reserved ipv6 subnet, the Vultr API will assign an entire IPv6 /64 subnet to the instance and assign it a random IP address within that space.

# Attach our Reserved /Public IPv6 Address
ip = @state['servers'][server]['ipv6']['subnet']
@log.info("Attaching #{ip} to #{server}")
vv(Vultr::RevervedIP.attach({'ip_address' => ip, 'attach_SUBID' => subid}), 412, -> {
  @log.info('IP Attached')
}, -> {
  @log.warn('Unable to attach IP. Rebooting VM')
  v(Vultr::Server.reboot({'SUBID' => subid}))
})

This means that every time a machine is rebuilt, it will have a different IPv6 address (although Bee2 will update the DNS records with that new address). I understand that assigning an entire /64 to a host is common practice for IPv6, and allows for several IPv6 features to work correctly. However, it’d be convenient if the Vultr API could also provide guarantees for the final static reserved /128 address which is given to the server.

One possible workaround is to have the lower part of the IPv6 address placed in the settings.yml file, have Vultr assign the subnet and then have Ansible replace the auto assigned /128 address Vultr gives the server. This would ensure rebuilding servers would always get the same IPv6 address (although it would not match up with the IP shown in the Vultr web interface). For now, Bee2 simply lets Vultr assign an IPv6 address from the reserved subnet and updates the DNS record. Those running Bee2 on IPv6 connections may have to flush their DNS cache or wait for older records to expire before running configuration management or SSHing to the remote servers.

Finally, when attaching a reserved IPv6 subnet to a machine, Vultr occasionally will return a 412, indicating that the machine must be rebooted. As shown in the previous code sample, this can be done via the API using the server/reboot function.

{:status=>412, :result=>"Unable to attach IP: Unable to attach subnet, please restart your server from the control panel"}

Deleting/Rebuilding Machines

Deleting a machine with a reserved IPv4 address doesn’t immediately release the IP address. The following function deletes all the servers we’ve defined in the configuration file, and then waits for existing reserved IP addresses to detach from current VMs. Without the wait loop, a rebuild would immediately fail with an error message indicating the address referenced in reserved_ip_v4 is still in use.

def delete_servers
  current_servers = v(Vultr::Server.list).map { |k,v| v['label'] }
  delete_servers = @state['servers'].keys.reject { |server| not current_servers.include? server }
  delete_servers.each { |server|
    @log.info("Deleting #{server}")
    v(Vultr::Server.destroy('SUBID' => @state['servers'][server]['SUBID']))
    while v(Vultr::RevervedIP.list).find { |k,v| v['label'] == server }.last['attached_SUBID']
      @log.info("Waiting on Reserved IP to Detach from #{server}")
      sleep(5)
    end
  }
end

Another issue with developing with the Vultr API is that virtual machines cannot be deleted for five minutes after they’ve been created. Developing against the Vultr API can therefore become very time consuming, with lots of waiting around when developing anything involving server/create and server/destroy.

{:status=>412, :result=>"Unable to destroy server: Servers cannot be destroyed within 5 minutes of being created"}

Putting it All Together

Using Bee2 is pretty straight forward. The command line arguments require a configuration file, and then allow for provisioning (-p) servers. Combining -p and -r will rebuild servers, destroying the existing servers if they exist. Finally, -a will run Ansible against either the public or private inventory IP addresses.

Usage: bee2 [-v] [-h|--help] [-c <config>] [-p [-r]]
    -c, --config CONFIG              Configuration File
    -p, --provision                  Provision Servers
    -v, --verbose                    Debug Logging Output Enabled
    -r, --rebuild                    Destroy and Rebuild Servers During Provisioning
    -a, --ansible INVENTORY          Run Ansible on Inventory (public|private)
    -h, --help                       Show this message

The provisioning, rebuilding and configuration management tasks can all be combined into a single command.

./bee2 -c settings.yml -p -r -a public

Conclusions

Overall, the Vultr API is usable, but it definitely has some design issues that can result in frustration. There were a few moments where I wasn’t sure if I had discovered some bugs. However, most of the issues I encountered either involved my own code, or not waiting for a service to be in the correct state before calling another action. The Vultr support staff were mostly helpful and responsive during the weekdays and standard business hours, with requests made on the weekend often having to wait until Monday.

Although I was able to successfully write a Bee2 provisioner for the Vultr API, it did require quite a bit of work. Their current API does show signs of underlying technical debt. I’m curious if there are underlying issues with their current platform that have resulted in some of their design decisions when it comes to their API. This is only the first version of their API, so hopefully we’ll see some improvements in future versions that will streamline some of the more complicated parts of the service provisining process.

This concludes our basic Vultr provisioner for Bee2. The specific version of Bee2 used in this article has been tagged as pd-vultr-blogpost, and the most current version of Bee2 can be found on Github. Future posts will include further work with Ansible and Docker, establishing an OpenVPN for our private network, securing the VMs and using docker to run various services.

Vultr Provider Issue #2611. Hashicorb. Github. Retrieved 5 July, 2017. ↩