New AI could help write your next textbook, using Penn State technology

Flying an airplane, telemarketing, and translation are all tasks that artificial intelligence can automate. Writing textbooks may be next, Penn State University has announced.

Original URL:

Original article

HAproxy in the era of Microservices

“Microservices”, the latest architecture buzzword being thrown around to describe perhaps one of the most interesting architecture styles of this decade.

What are microservices?

To use Martin Fowler’s definition:

In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies

To simplify it further, you can think of microservices as small and autonomous services or independent processes that work together within a bounded context, communicating with each other over lightweight transports like HTTP.

How small is micro? As with everything, it depends. Some claim that a microservice should consist of a single actor. Others, like Jon Eaves, claim it should be something you can complete in at most 2 weeks. I would think a general rule of thumb is that it should be small enough to be easily maintained by a small team (or a dev), and that it should focus on doing one thing and doing it very well.

The benefits of such an architecture are plentiful. For example, it makes it easier to adopt new technologies faster and to grow your team. It makes it easier to adopt the appropriate technology for solving a particular problem (e.g. you could have a microservice written in Scala and using Neo4j for storage, alongside another microservice written in Go and using Cassandra in the backend). It limits the risks of a complete system shutdown as most pieces are spread across a fleet of services across several machines. It makes it easier to scale on smaller machines, which can be huge cost saver.

Nevertheless, such an approach adds complexity in different areas, one of which is routing.

Unified routing

Assuming a relatively complex domain broken into multiple Bounded contexts, each of them can have 2 to N amount of microservices, each of them doing specific domain things. When scaling them, that number would grow even larger.

If you’re trying to consume those services, you probably don’t to want to keep track of them all. To go even further, if I’m writing a mobile application that needs to communicate with those many services, I don’t want to have to maintain all the many addresses to those microservices. What would be better for me would be the ability to program against 1 base URL per bounded context (e.g., and somehow have it figure out what microservices needs to be called based on a header.

This is where HAproxy comes in.

HAproxy with microservices

As its name mentions, HAproxy is a high availability proxy server and load balancer that works both for TCP and HTTP. More information on it can be found in its docs.

Amongst its many features is the concept of an Access List (acl) that can be used to determine which backend to send a request to. The acl can be used to look at the header and the url, amongst other things.

To go back to our example of a mobile application wanting a single contact point, the request the dev could send over the wire could include a header (e.g “x-microservice-app-id”) that HAproxy would then use to determine which endpoint to route to. Note: That contact point could itself be tied to A record with multiple IP addresses pointing to load balancers to avoid having a single point of failure.

Here’s an example configuration on how to do so.

The header names would be the one thing the client would have to maintain. Alternatively, if a convention was to be used, the header value would be computed, thus removing the need of the client to maintain the potentially numerous header names.

Following this approach would require you to be capable of syncing up your haproxy configuration with your deployment. There are various tools that can help such as Marathon – which has utilities to produce haproxy configurations similar to the one in the above example, and Kong – which is an API gateway.

Original URL:

Original article

New platform simplifies OpenStack cloud networking

Cloud access

OpenStack is a popular open source tool for creating public and private clouds and is used by big companies around the world.

To make running OpenStack systems easier, open source network specialist Akanda is launching a new version of its Astara platform that radically simplifies the complexity and scale of implementations.

“Astara’s first release as an official OpenStack project is an exciting one for OpenStack operators,” says Henrik Rosendahl, CEO, Akanda. “The goal of Astara is to make Networking and DevOps’ lives easier. With tremendous community support and momentum for the platform throughout its first year, Astara is the answer for massively simplified OpenStack networking stack that can replace traditional — and expensive — single vendor lock-in”.

The latest Astara release is compatible with the latest ‘Liberty’ OpenStack release and includes a new load balancer driver which allows OpenStack operators to configure the platform to load and manage only the resources they choose. Neutron virtual network resources are now much more quickly provisioned onto appliance VMs via a new service that manages pools of hot-standby appliance VMs.

The new release also offers integration and support for Dynamic Lightweight Network Virtualization which gives OpenStack operators a complete, OpenStack-ready stack. There are active high availability and scaling improvements, plus syncing to Liberty’s global requirements, ensuring smooth installation into system namespaces shared by other OpenStack projects. For clouds running OpenStack Kilo or Juno releases the new Astara release can be completely back ported.

More information about the latest release is available on the Akanda blog.

Image CreditChaiyapop Bhumiwat / Shutterstock

Original URL:

Original article

Show HN: CV Boilerplate – Easing the Process of Building a CV Using LaTeX

A boilerplate to ease the pain of building and maintaining a CV or résumé using LaTeX. The perfect companion to letter-boilerplate.


Separating presentation from content makes life easier. The typical content of a CV is a perfect fit for a yaml file due to its structured nature:

name: Friedrich Nietzsche
- Humboldtstraße 36
- 99425 Weimar
- Prussia
# ...
- years: 1879--1889
  employer: Freiberufler
  job: Freier Philisoph
  city: Sils-Maria
- years: 1869–-1879
  employer: Universität Basel
  job: Professor für klassische Philologie
  city: Basel

That makes super easy to update a CV while keeping a consistent structure.

Thanks to pandoc, we can then access our data from template.tex by using a special notation. Iterating on repetitive data structures becomes trivial:


Below a preview of the final result. Check out the output to see the compiled PDF.



  1. LaTeX with the following extra packages: fontspec geometry multicol xunicode xltxtra marginnote sectsty ulem hyperref polyglossia
  2. Pandoc

To install LaTeX on Mac OS X, I recommend getting the smaller version BasicTeX from here and installing the additional packages with tlmgr afterwards. Same goes for Linux: install texlive-base with your package manager and add the needed additional packages later.

To install pandoc on Mac OS X, run brew install pandoc. To install it on Linux, refer to the official docs.

Getting started

  1. Edit content.yml with your personal details, work experience, education, and desired settings.
  2. Run make to compile the PDF.
  3. Tweak on template.tex until you’re satisfied with the result.

Refer to pandoc’s documentation to learn more about how templates work.

Note: this template needs to be compiled with XeTeX.

Available settings

  • mainfont: Hoefler Text is the default, but every font installed on your system should work out of the box (thanks, XeTeX!)
  • fontsize: Possible values here are 10pt, 11pt and 12pt.
  • lang: Sets the main language through the polyglossia package. This is important for proper hyphenation, among other things.
  • geometry: A string that sets the margins through geometry. Read this to learn how this package works.

Recommended readings

See also

  • letter-boilerplate — Quickly and painlessly generate high-quality letters from markdown through LaTeX


This repository contains a modified version of Dario Taraborelli’s cvtex template.

License: CC BY-SA 3.0

Original URL:

Original article

The Stack That Helped Medium Drive 2.6 Millennia of Reading Time

Current Stack

For a site as seemingly simple as Medium, it may be surprising how much complexity is behind the scenes. It’s just a blog, right? You could probably knock something out using Rails in a couple of days. 🙂

Anyway, enough snark. Let’s start at the bottom.

Production Environment

We are on Amazon’s Virtual Private Cloud. We use Ansible for system management, which allows us to keep our configuration under source control and easily roll out changes in a controlled way.

We have a service-oriented architecture, running about a dozen production services (depending on how you count them and some more micro than others). The primary choice as to whether to deploy a separate service is the specificity of the work it performs, how likely dependent changes are to be made across service boundaries, and the resource utilization characteristics.

Our main app servers are still written in Node, which allows us to share code between server and client, something we use quite heavily with the editor and post transformations. Node has worked pretty well for us, but performance problems have emerged where we block the event loop. To alleviate this, we run multiple instances per machine and route expensive endpoints to specific instances, thus isolating them. We’ve also hooked into the V8 runtime to get insights into what ticks are taking a long time; generally it’s due to object reification during JSON deserialization.

We have several auxiliary services written in Go. We’ve found Go very easy to build, package, and deploy. We like the type-safety without the verbosity and JVM tuning of Java. Personally, I’m a fan of using opinionated languages in a team environment; it improves consistency, reduces ambiguity, and ultimately gives you less rope to hang yourself.

We now serve static assets using CloudFlare, though we send 5% of traffic to Fastly and 5% to CloudFront to keep their caches warm should we need to cut over in an emergency. Recently we turned up CloudFlare for application traffic as well — primarily for DDOS protection but we’ve been happy with the performance gains.

We use a combination of Nginx and HAProxy as reverse proxies and load balancers, to satisfy the Venn Diagram of features we need.

We still use Datadog for monitoring and PagerDuty for alerts, but we now heavily use ELK (Elasticsearch, Logstash, Kibana) for debugging production issues.


DynamoDB is still our primary datastore, but it hasn’t been completely smooth sailing. One of the perennial issues we’ve hit is the hotkey issue during viral events or fanouts for million-follower users. We have a Redis cache cluster sitting in front of Dynamo, which mitigates these issues with reads. Optimizing for developer convenience and production stability have often seemed at odds, but we’re working to close the gap.

We’re starting to use Amazon Aurora for some newer data, which allows more flexible querying and filtering than Dynamo.

We use Neo4J to store relations between the entities that represent the Medium network, running a master with two replicas. People, posts, tags, and collections are nodes in the graphs. Edges are created on entity creation and when people perform actions such as follow, recommend, and highlight. We walk the graph to filter and recommend posts.

Data Platform

From early on we’ve been very data hungry, investing in our analytics infrastructure to help us make business and product decisions. More recently we’re able to use the same data pipelines to feed back into production systems to power data-driven features such as Explore.

We use Amazon Redshift as our data warehouse, providing the scalable storage and processing system our other tools build on. We continuously import our core data set (e.g. users, posts) from Dynamo into Redshift, and event logs (e.g. post viewed, post scrolled) from S3 to Redshift.

Jobs are scheduled by Conduit, an internal tool that manages scheduling, data dependencies, and monitoring. We use an assertion-based scheduling model, where jobs will only be executed if their dependencies are satisfied (e.g. daily job that depends on an entire day of event logs). In production this has proved indispensable — data producers are decoupled from their consumers, simplifying configuration, and the system is very predictable and debuggable.

While SQL queries running in Redshift work well for us, we need to get data into and out of Redshift. We’ve increasingly turned to Apache Spark for ETL because of its flexibility and ability to scale with our growth. Over time Spark will likely become the tool of choice for our data pipelines.

We use Protocol Buffers for our schemas (and schema evolution rules) to keep all layers of the distributed system in sync, including mobile apps, web service, and data warehouse. Using custom options, we annotate our schemas with configuration details like table name and indexes, and validation constraints like max length for strings, or acceptable ranges for numbers.

People need to remain in sync too so mobile and web app developers can all log the same way, and Product Scientists can interpret fields, in the same way. We help our people work with data by treating the schemas as the spec, rigorously documenting messages and fields and publishing generated documentation from the .proto files.


Our image server is now written in Go and uses a waterfall strategy for serving processed images. The servers use groupcache, which provides a memcache alternative while helping to reduce duplicated work across the fleet. The in-memory cache is backed by a persistent S3 cache; then images are processed on demand. This gives our designers the flexibility to change image presentation and optimize for different platforms without having to do large batch jobs to generate resized images.

While it’s now mainly used for resizing and cropping, earlier versions of the site allowed for color washes, blurring, and other image effects. Processing animated gifs has been a huge pain for reasons that should be another post.


The fun TextShots feature is powered by a small Go server that interfaces with PhantomJS as a renderer process.

I always imagined switching the rendering engine to use something like Pango, but in practice, the ability to lay out the image in HTML is way more flexible and convenient. And the frequency at which the feature is used means we can handle the throughput quite easily.

Custom Domains

We allow people to set up custom domains for their Medium publications. We wanted single sign-on and HTTPS everywhere, so it wasn’t super trivial to get working. We have a set of dedicated HAProxy servers that manage certs and route traffic to the main fleet of application servers. There is some manual work required when setting up a domain, but we’ve automated large swathes of it through a custom integration with Namecheap. The cert provisioning and publication linking is handled by a dedicated service.

Web Frontend

On the web, we tend to want to stay close to the metal. We have our own Single Page Application framework that uses Closure as a standard library. We use Closure Templates for rendering on both the client and the server, and we use the Closure Compiler to minify the code and split it into modules. The editor is the most complex part of our web app, which Nick has written about.


Both our apps are native, making minimal use of web views.

On iOS, we use a mixture of homegrown frameworks and built-in components. In our network layer, we use NSURLSession for making requests and Mantle for parsing JSON into models. We have a caching layer built on top of NSKeyedArchiver. We have a generic way to render items in a list with a common styling, which allows us to quickly build new lists with different types of content. The post view is built with a UICollectionView with a custom layout. We use shared components to render the full post and the post preview.

Every commit is built and pushed to Medium employees, so that we can try out the app as quickly as possible. The cadence of our release to the appstore is beholden to the review cycle, but we try to keep pushing as fast as we can, even if there are only minimal updates.

For tests, we use XCTest and OCMock.


On Android, we stay current with the very latest editions of the SDK and support libraries. We don’t use any comprehensive frameworks, preferring instead to establish consistent patterns for repeated problems. We use guava for all the things missing from Java. But otherwise, we tend to use 3rd party tools that aim to solve more narrow problems. We define our API responses using protocol buffers and then generate the objects we use in the app.

We use mockito and robolectric. We write high-level tests that spin up activities and poke around — we create basic versions of these when we first add a screen or to prepare for refactoring. They grow as we reproduce bugs to shield against regression. We write low-level tests to exercise the particulars of a single class — we create these as we build out new features and they help us reason about how our classes interact.

Every commit is automatically pushed to the play store as an alpha build, which goes out to Medium staff right away. (This includes another flavor of the app, for our internal version of Medium — Hatch). Most Fridays we promote the latest alpha to our beta group and have them play with things over the weekend. Then, on Monday, we promote it from beta to production. Since the latest code is always ready for release, when we find a bad bug, we get the fix out to production immediately. When we’re worried about a new feature, we let the beta group play with things a little longer; when we’re excited, we release even more frequently.

A|B Testing & Feature Flags

All our clients use server-supplied feature flags, called variants, for A|B testing and guarding unfinished features.


There are a lot of other things on the fringe of the product that I haven’t mentioned above: Algolia has allowed us to iterate quickly on search-related functionality, SendGrid for inbound and outbound email, Urban Airship for notifications, SQS for queue processing, Bloomd for bloom filters, PubSubHubbub and Superfeedr for RSS, etc. etc.

Original URL:

Original article

A Matter of Trust: Why the Time is Right to Adopt the Uniform Electronic Legal Materials Act (UELMA) in Florida

This this article, Law Librarian Patricia Morgan brings our attention to a group of prominently related issues on electronic legal research whose application are critical for attorneys, librarians and courts. In an era where cost-cutting has become increasingly important, there already exists an untapped resource related to legal research. More and more resources exist online (some exclusively). It has been a long time since the introduction of the Internet, but it is finally going to prove instrumental in reducing the cost of legal research. It is time to come to terms with the fact that most legal material should be readily available electronically and that there must be a way to verify that the material is authentic. As Morgan queries and answers – Uniform Law, Anyone?

Original URL:

Original article

Judge tosses Wikimedia’s anti-NSA lawsuit because Wikipedia isn’t big enough

On Friday, a federal judge dismissed an anti-surveillance lawsuit brought by Wikimedia, ruling in favor of the National Security Agency.

In his 30 page ruling, US District Judge T.S. Ellis III found that Wikimedia and the other plaintiffs had no standing, and could not prove that they had been surveilled, largely echoing the previous 2013 Supreme Court decision in the case of Clapper v. Amnesty International.

Judge Elliss found that there is no way to definitively know if Wikimedia, which publishes Wikipedia, one of the largest sites on the Internet, is being watched.

As he wrote in his memorandum opinion:

Plaintiffs’ argument is unpersuasive, as the statistical analysis on which the argument rests is incomplete and riddled with assumptions. For one thing, plaintiffs insist that Wikipedia’s over one trillion annual Internet communications is significant in volume. But plaintiffs provide no context for assessing the significance of this figure. One trillion is plainly a large number, but size is always relative. For example, one trillion dollars are of enormous value, whereas one trillion grains of sand are but a small patch of beach.

As already discussed, although plaintiffs have alleged facts that plausibly establish that the NSA uses Upstream surveillance at some number of chokepoints, they have not alleged facts that plausibly establish that the NSA is using Upstream surveillance to copy all or substantially all communications passing through those chokepoints. In this regard, plaintiffs can only speculate, which Clapper forecloses as a basis for standing.

Since the June 2013 Snowden revelations, by and large, it has been difficult for legal challenges filed against government surveillance to advance in the courts.

Original URL:

Original article

Man Licenses His Video Footage To Sony, Sony Issues Copyright Claim Against Him

An anonymous reader writes: Mitch Martinez creates high-resolution stock video footage, and then licenses it out to people who need footage to go along with their creative projects. He has written an article at PetaPixel explaining his bizarre interaction with Sony Music Entertainment, and the hassle they put him through to fix it. Martinez licensed one of his videos to Epic Records, and they used it as background for a music video on YouTube. Less than two months later, his original video on YouTube was hit with a copyright claim from Sony. After figuring out that Epic Records was a subsidiary to Sony, he disputed the copyright claim — which is usually the end of it. But after reviewing the videos, Sony rejected it, saying their claim was still valid. Martinez then tried to contact the person at Epic Records to whom he issued the license. None of his emails got a response. Then he had to get in touch with Epic’s legal department. After a lengthy series of emails, voicemails, and phone calls, he finally got somebody to admit it was his video. It still took a few more calls to work out the details, but the company finally released the copyright claim. Martinez concludes by offering some tips on how to resolve such claims.

Share on Google+

Read more of this story at Slashdot.

Original URL:

Original article

VersionPress Picks Up Backing From Credo Ventures For ‘WordPress Meets Git’ Solution

successful-activation Anybody who has ran or developed a WordPress-powered site, be it a humble blog or something more complex, knows that it’s pretty easy to make undesirable changes. This can be either content-related or a change to the WordPress theme or plugin you’re running. While backing up is crucial, a primitive backup doesn’t always let you roll back to the exact point where everything… Read More

Original URL:

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: