Booting up a fresh blogosphere

My recent piece, Anywhere But Medium, has gotten a fair amount of play. Nothing on TechMeme, but I didn’t expect them to cover it, they generally promote VC-backed tech companies, and the message in the ABM piece was very much counter to the current thinking in VC. I hope eventually that will change, and their investments will accept the open web, and their companies will create products that feed back into the web, products that can be built on to create new products without forcing every new venture to start over from scratch. I think it’ll be a much more powerful and healthy ecosystem then. This ecosystem is eventually going to run out of room to grow. I suspect that’s going to happen pretty soon.

Anyway, there is probably enough agreement “out there” to create a critical mass for a newly invigorated blogosphere to boot up along the lines of the one that started this whole thing in the late 90s to early 2000s. What we need is a little new technology, and support from one or two vendors.

It’s interesting to see Mike Caulfield try to get under the covers of WordPress to be able to directly edit the code behind the rendering. My good friend Daniel Bachhuber, who apparently is back at work at Automattic (I didn’t know!) says there are reasons it’s not possible. There are privacy concerns, people use short codes in WordPress and might not want them revealed publicly. I’ll have to learn more about this.

But there is another approach, to have WordPress accept as input, the URL of a JSON file containing the source code for a post, and then do its rendering exactly as if that post had been written using WordPress’s editor. That would give us exactly what we need to have the best of all worlds. Widespread syndication and control over our own writing and archive. All at a very low cost in terms of storage and CPU. 

I wrote about this in an earlier piece.

Here’s an example of the JSON for this post. I’m already storing it publicly with my in-development blogging tool, Since it hasn’t yet been deployed outside my own server there’s still time to change the format, with relatively little breakage.

Just want to put this idea out there for people who are thinking about this stuff. APIs are not necessary. Just a new syndication format. We could even use an existing format, but since we’re mostly working in JavaScript these days, I think JSON is also a fine way to go. 😉

Original URL:

Original article

IBM makes LinuxONE Mainframe Go Faster

ServerWatch: When most people think about mainframes, they tend to think about big iron that is very stable, though it isn’t particularly agile or flexible. That’s a myth that IBM is now aiming to dispel through the rapid pace of both hardware and software innovation it is landing on the LinuxONE class of mainframes.

Original URL:

Original article

Firefox 44 Arrives With Push Notifications

An anonymous reader writes: Mozilla today launched Firefox 44 for Windows, Mac, Linux, and Android. Notable additions to the browser include push notifications, the removal of RC4 encryption, and new powerful developer tools. Mozilla made three promises for push notifications: “1. To prevent cross-site correlations, every website receives a different, anonymous Web Push identifier for your browser. 2. To thwart eavesdropping, payloads are encrypted to a public / private keypair held only by your browser. 3. Firefox only connects to the Push Service if you have an active Web Push subscription. This could be to a website, or to a browser feature like Firefox Hello or Firefox Sync.” Here are the full changelogs: Desktop and Android.

Share on Google+

Read more of this story at Slashdot.

Original URL:

Original article

Data Scientist Training for Librarians – re-skilling libraries for the future

Loud and clear: Data Scientist Training for Librarians (DST4L) is a wonderful concept. It’s an experimental course developed at Harvard-Smithsonian Center for Astrophysics Library with the aim to train librarians to respond to the growing data needs of there communities. We all know the story: The Internet happened and the amount of information and data exploded. And it’s right in front of everybody’s nose – the neighbor kid as well as the scholar. Information and data in any form is the building bricks of science and knowledge and with the rapid increase of these the need to gain tools and skills to tame and analyze them grow. This development has changed the way academia works and when academia changes the library should pay attention and think of it’s options.

DST4L is a direct respond to the development above. And what and awesome response.

DST4L – Copenhagen 9th – 11th of September 2015

DST4L has been held three times in The States and was to be set for the first time in Europe at Library of Technical University of Denmark just outside of Copenhagen. 40 participants from all across Europe were ready to get there hands dirty over three days marathon of relevant tools within data archiving, handling, sharing and analyzing. See the full program here and check the #DST4L hashtag at Twitter.

Day 1: OpenRefine

OpenRefine is an open source desktop application for data cleanup and transformation to other formats. It is similar to spreadsheet applications and can work with spreadsheet file formats). Unlike spreadsheets, no formulas are stored in the cells, but formulas are used to transform the data, and transformation is done only once. Transformation expressions can be written in Google Refine Expression Language (GREL), Jython (i.e. Python) and Clojure.

How can it be used in libraries and academia?: When scraping for instance tweets from Twitter you often end up with a dirty dataset with alot of noise that I want to lead out before starting analyzing my data. OpenRefine can be used for this.

IMG_150340 librarians from across the world gathered at DTU Library, Copenhagen, to get there hands dirty with Data Scientist Training for Librarians. 

At the end of day 1 we throw a social event with drinks ‘n data at our Digtial Social Science Lab (DSSL) at The Faculty Library of Social Sciences, Copenhagen University library. DSSL project head Michael Svendsen and I gave a quick talk on the concept of DSSL and then there were bubbles and lots of talk before we hit a restaurant for something to eat. Great ending of a great day.

IMG_1508Drinks ‘n data @ Digital Social Science Lab

Day 2: GitHub

GitHub is a web-based collaborative platform for code management and code review for open source and private projects.  Public projects are for free and private comes with a fee. According to GitHub there are having 9 million users and over 21 million repositories which makes them the largest host of source code in the world.

How can it be used in libraries and academia?: For one thing  GitHub can function as a strong collaborative platform within various academic disciplines and for libraries there is an opportunity to support this with know-how skills on GitHub. For libraries themselves GitHub can be used for developing and sharing good stuff liberated from time and place. For instance: How many LibGuides on e.g. sociology are there around the world build up from scratch? A lot. If libraries used GitHub to share and develop only one prototype LibGuide for Sociology this could be used as a strong starting point for all LibGuides on Sociology around the world.

The GitHub Octocat @ DST4L

Day 3: Python

Python is a programming language which syntax allows programmers to express concepts in fewer lines of code than would be possible in languages like Java or C++. Python interpreters are available for installation on many operating systems, allowing Python code execution on a wide variety of systems. Using third-party tools, such as Py2exe or Pyinstaller, Python code can be packaged into stand-alone executable programs for some of the most popular operating systems, allowing the distribution of Python-based software for use on those environments without requiring the installation of a Python interpreter.

How can it be used in libraries and academia?: Being a programming language Python can be used for many things but one thing that for me stands out as a good way to use Python is web scraping. The web contains huge amounts of data which is relevant for researchers and students. Let’s say you want to scrape tweets on the danish election to analyze on various parameters. Python can help you do that and when you got your dataset you can clean it up in OpenRefine before analyzing in for instance NVivo.

Here are some good blog post on how to get started on web scraping with Python:

Easy web scraping with python

Web scraping 101 with Python

IMG_5204Hands-on practice with Python

Ending notes and the future of DST4L

DST4L is important because it clearly addresses some of the key skills librarians within academia will need to gain to continue to create value to there institutions – and not only on a strategical level, you actually learn how to use stuff like OpenRefine, GitHub and Python. That said the learning curve is pretty steep, at least it was for me, and I’m no master of the things I learned. But that is important is that I now have a basic understanding of what we a capable of doing with these tools and we are standing on a very solid platform for building a service towards our university and faculties on these matters.

DST4L has been brought to Copenhagen by a couple of great data enthusiasts (hands up for Chris Erdmann, Ivo Grigorov and Mikael Elbæk) and I’m thankful that people like them put effort and time into a concept that brings so much value to the table. But the question that stands after 3 days of Data Scientist Training is how will use the time and energy to make sure the next DST4L i happening? DST4L is important for the future of libraries but to survive I guess it has to be lifted out of the hands of awesome enthusiasts and into an sustainable organizational structure that provides the world of librarianship with great data scientist training. Maybe a task for OCLC or another major worldwide library player.

For now: thanks for some great and valuable DST4L days in Copenhagen. Hope that there will be many many DST4L sessions in the future.

Processed with Rookie Cam

Processed with Rookie Cam

Original URL:

Original article

Walmart Launches OneOps, An Open-Source Cloud And Application Lifecycle Management Platform

@WalmartLabs_Concept25 Walmart (yes, that Walmart), is launching a new open source DevOps platform for cloud and application lifecycle management. OneOps, which was developed by Walmart Labs, is meant to help developers write and launch their apps faster and make maintaining them easier. The company first announced its plans to open source the service last year. “Our mission is to give our customers the… Read More

Original URL:

Original article

OneOps – Open-source cloud ops platform from Walmart

Four years ago, the Walmart Global eCommerce system was a monolithic
application, deployed once every 2 months. This represented an unsustainable
rate of innovation given the competitive climate. Walmart recognized
the need to fundamentally transform technology delivery to compete in the Digital

@WalmartLabs was founded in an effort to re-invent how products are designed and
delivered within the eCommerce division. A project code named Pangaea paved the
way. Walmart’s eCommerce site was re-built following a service oriented
architecture, while adopting a DevOps culture, and migrating to cloud based
infrastructure. Knowing that providing developers cloud infrastructure alone
only reveals the next layer of friction, managing application lifecycle, OneOps
was acquired early in 2013 and has been under active internal development since.

Today the Walmart eCommerce platform is hosted in some of the largest OpenStack
clouds and is managed exclusively via OneOps. On a typical day there are now over
1,000 deployments, executed on-demand by development teams, each taking only minutes
on average to complete.

The three necessary ingredients for success were:

  • A service oriented architecture for the site, without which the
    complexities of coordinating integration of the monolithic release
    dominated the schedule.
  • Localized and empowered ownership of and accountability for each service
    through building a DevOps culture.
  • Access to infrastructure at the fingertips of the developers, what they
    needed, when they needed it. This was accomplished through OneOps application
    lifecycle management backed by cloud infrastructure, enabling teams to focus on
    the most valuable aspect of their job – the code.

Original URL:

Original article

How We Automate Our Infrastructure

Growing a business is hard and growing the engineering team to support that is arguably harder, but doing both of those without a stable infrastructure is basically impossible. Particularly for high growth businesses, where every engineer must be empowered to write, test, and ship code with a high degree of autonomy.

Over the past year, we’ve added ~60 new integrations (to over 160), built a platform for partners to write their own integrations, released a Redshift integration, and have a few big product announcements on the way. And in that time, we’ve had many growing pains around managing multiple environments, deploying code, and general development workflows. Since our engineers are happiest and most productive when their time is spent shipping product, building tooling, and scaling services, it’s paramount that the development workflow and its supporting infrastructure are simple to use and flexible.

And that’s why we’ve automated many facets of our infrastructure. We’ll share our current setup in greater detail below, covering these main areas:

Let’s dive in!

Syncing Dev Environments

As the code complexity and the engineering team grow, it can become harder to keep dev environments consistent across all engineers.

Before our current solution, one big problem our engineering team faced was keeping all dev environments in sync. We had a GitHub repo with a set of shell scripts that all new engineers executed to install the necessary tools and authentication tokens onto their local machines. These scripts would also setup Vagrant and a VM.

But this VM was built locally on each computer. If you modified the state of your VM, then in order to get it back to the same VM as the other engineers, you’d have to build everything again from scratch. And when one engineer updates the VM, you have to tell everyone on Slack to pull changes from our GitHub VM repo and rebuild. An awfully painful process, since Vagrant can be slow.

Not a great solution for a growing team that is trying to move fast.

When we first played with Docker, we liked the ability to run code in a reproducible and isolated environment. We wanted to reuse these Docker principles and experience in maintaining consistent dev environments across a growing engineering team.

We wrote a bunch of tools to set up the VM for new engineers to upgrade or to reset from the basic image state. When our engineers set up the VM for the first time, it asks for their GitHub credentials and AWS tokens, then pulls and builds from the latest image in Docker Hub.

Docker VM setup

On each run, we make sure that the VM is up-to-date by querying the Docker Hub API. This process updates packages, tools, etc. that our engineers use everyday. It takes around 5 seconds and is needed in order to make sure that everything is running correctly for the user.

Additionally, since our engineers use Macs, we switched from boot2docker virtualbox machine to a Vagrant hosted boot2docker instance so that we could take advantage of NFS to share the volumes between the host and guest. Using NFS provides massive performance gains during local development. Lastly, NFS allows any changes our engineers make outside of the VM to be instantaneously reflected within the VM.

With this solution we have vastly reduced the number of dependencies needed to be installed on the host machine. The only things needed now are Docker, Docker Compose, Go, and a GOPATH set.

Mirroring Dev and Prod Environments

The ideal situation is dev and prod environments running the same code, yet separated so code running on dev may never affect code running production.

Before we had the AWS state (generated by Terraform) stored alongside the Terraform files, but this wasn’t a perfect system. For example if two people asynchronously plan and apply different changes, the state will be modified and who pushes last is going to have hard times to figure out the merge collisions.

We achieved mirroring staging and production in the simplest way possible: copying files from one folder to another. Terraform enabled us to reduce the amount of hours taken to modify the infrastructure, deploy new services and making improvements.

We integrated Terraform with CircleCI writing a custom build process and ensuring that the right amount of security was taken in consideration before applying.

Staging and prod environments

At the moment, we have one single repository hosted on GitHub named infrastructure, which contains a collection of Terraform scripts that configure environmental variables and settings for each of our containers.

When we want to change something in our infrastructure, we make the necessary changes to the Terraform scripts and run them before opening a new pull request for someone else on the infra-team to review it. Once the pull request gets merged to master, CircleCI will start the deployment process: the state gets pulled, modified locally, and stored again on S3.

Developing Locally

Seeding Databases

When developing locally, it’s important to populate local data stores with dummy data, so our app looks more realistic. As such, seeding databases is a common part of setting up the dev environment.

We rely on CircleCI, Docker, and volume containers to provide easy access to dummy data. Volume containers are portable images of static data. We decided to use volume containers because the data model and logic becomes less coupled and easier to maintain. Also just in case this data is useful in other places in our infrastructure (testing, etc., who knows).

Loading seed data into our local dev environment occurs automatically when we start the app server in development. For example, when the app (our main application) container is started in a dev environment, app‘s docker-compose.yml script will pull the latest seed image from Docker Hub and mount the raw data in the VM.

The seed image from Docker Hub is created from a GitHub repo seed, that is just a collection of JSON files as the raw objects we import into our databases. To update the seed data, we have CircleCI setup on the repo so that any publishes to master will build (grabbing our mongodb and redis containers from Docker Hub) and publish a new seed image to Docker Hub, which we can use in the app.

Spinning Up Microservices

Due to the data-heavy nature of Segment, our app already relies on several microservices (db service, redis, nsq, etc). In order for our engineers to work on the app, we need to have an easy way to create these services locally.

Again, Docker makes this workflow extremely easy.

Similar to how we use seed volume containers to mount data into the VM locally, we do the same with microservices. We use the docker compose file to grab images from Docker Hub to create locally, set addresses and aliases, and ultimately reduce the complexity to a single terminal command to get everything up and running.

Deploying to Production

If you write code, but never ship it to production, did it ever really happen?

Original URL:

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: