ContentTools: A WYSIWYG editor for HTML content

ContentTools logo

A beautiful & small content editor

Show me

Getting started

The ContentTools WYSIWYG editor can be added to any HTML
page in a few simple steps. The getting started guide
shows how.


Full API documentation and examples for the ContentTools family
of libraries.


Step-by-step guides for common use scenarios as well as more
advanced topics for those rolling their own editors.


The ContentTools family of libraries is free and open-source.
The libraries are hosted, developed and maintained on GitHub.

Original URL:

Original article

Hadoop filesystem at Twitter

Twitter runs multiple large Hadoop clusters that are among the biggest in the world. Hadoop is at the core of our data platform and provides vast storage for analytics of user actions on Twitter. In this post, we will highlight our contributions to ViewFs, the client-side Hadoop filesystem view, and its versatile usage here.

ViewFs makes the interaction with our HDFS infrastructure as simple as a single namespace spanning all datacenters and clusters. HDFS Federation helps with scaling the filesystem to our needs for number of files and directories while NameNode High Availability helps with reliability within a namespace. These features combined add significant complexity to managing and using our several large Hadoop clusters with varying versions. ViewFs removes the need for us to remember complicated URLs by using simple paths. Configuring ViewFs itself is a complex task at our scale. Thus, we run TwitterViewFs, a ViewFs extension we developed, that dynamically generates a new configuration so we have a simple holistic filesystem view.

Hadoop at Twitter: scalability and interoperability
Our Hadoop filesystems host over 300PB of data on tens of thousands of servers. We scale HDFS by federating multiple namespaces. This approach allows us to sustain a high HDFS object count (inodes and blocks) without resorting to a single large Java heap size that would suffer from long GC pauses and the inability to use compressed oops. While this approach is great for scaling, it is not easy for us to use because each member namespace in the federation has its own URI. We use ViewFs to provide an illusion of a single namespace within a single cluster. As seen in Figure 1, under the main logical URI we create a ViewFs mount table with links to the appropriate mount point namespaces for paths beginning with /user, /tmp, and /logs, correspondingly.

The configuration of the view depicted in Figure 1 translates to a lengthy configuration of a mount table named clusterA. Logically, you can think of this as a set of symbolic links. We abbreviate such links simply as /logs->hdfs://logNameSpace/logs. Here you can find more details about our TwitterViewFs extension to ViewFs that handles both hdfs:// and viewfs:// URI’s on the client side to onboard hundreds of Hadoop 1 applications without code changes.

Twitter’s Hadoop client and server nodes store configurations of all clusters. At Twitter, we don’t invoke the hadoop command directly. Instead we use a multiversion wrapper hadoop that dispatches to different hadoop installs based on a mapping from the configuration directory to the appropriate version. We store the configuration of cluster C in the datacenter DC abbreviated as C@DC in a local directory /etc/hadoop/hadoop-conf-C-DC, and we symlink the main configuration directory for the given node as /etc/hadoop/conf.

Consider a DistCp from source to destination. Given a Hadoop 2 destination cluster (which is very common during migration), the source cluster has to be referenced via read-only Hftp regardless of the version of the source cluster. In case of a Hadoop 1 source, Hftp is used because the Hadoop 1 client is not wire-compatible with Hadoop 2. In case of a Hadoop 2 source, Hftp is used as there is no single HDFS URI because of federation. Moreover, with DistCp we have to use the destination cluster configuration to submit the job. However, the destination configuration does not contain information about HA and federation on the source side. Our previous solution implementing a series of redirects to the right NameNode is insufficient to cover all scenarios encountered in production so we merge all cluster configurations on the client side to generate one valid configuration for HDFS HA and ViewFs for all Twitter datacenters as described in the next section.

User-friendly paths instead of long URIs

We developed user-friendly paths instead of long URIs and enabled native access to HDFS. This removes the overwhelming number of different URIs and greatly increases the availability of the data. When we use multi-cluster applications, we have to cope with the full URIs that sometimes have a long authority part represented by a NameNode CNAME. Furthermore, if the cluster mix includes both Hadoop 1 and Hadoop 2, which are not wire-compatible, we unfortunately have to remember which cluster to address via the interoperable Hftp filesystem URI. The volume of questions around this area on our internal Twitter employee mailing lists, chat channels and office hours motivated us to solve this URI problem for good on the Hadoop 2 side. We realized that since we already present multiple namespaces as a single view within a cluster, we should do the same across all all clusters within a datacenter, or even across all datacenters. The idea is that a path /path/file at the cluster C1 in the datacenter DC1 should be mounted by the ViewFs in each cluster as /DC1/C1/path/file as shown Figure 3. This way we will never have to specify a full URI, nor remember whether Hftp is needed because we can transparently link via Hftp within ViewFs.

With our growing number of clusters and number of namespaces per cluster, it would be very cumbersome if we had to maintain additional mount table entries in each cluster configuration manually as it turns into a O(n2) configuration problem. In other words, if we change the configuration of just one cluster we need to touch all n cluster configurations just for ViewFs. We also need to handle the HDFS client configuration for nameservices because otherwise mount point URIs cannot be resolved by the DFSClient.

It’s quite common that we have the same logical cluster in multiple datacenters for load balancing and availability: C1@DC1, C1@DC2, etc. Thus, we decided to add some more features to TwitterViewFs. Instead of populating the configurations administratively, our code adds the configuration keys needed for the global cross-datacenter view at the runtime during the filesystem initialization automatically. This allows us to change existing namespaces in one cluster, or add more clusters without touching the configuration of the other clusters. By default our filesystem scans the glob file:/etc/hadoop/hadoop-conf-*.

The following steps construct the TwitterViewFs namespace. When the Hadoop client is started with a specific C-DC cluster configuration directory, the following keys are added from all other C’-DC’ directories during the TwitterViewFs initialization:

  1. If there is a ViewFs mount point link like /path->hdfs://nameservice/path in C’-DC’, then we will add a link /DC’/C’/path->hdfs://nameservice/path. For the Figure 1 example above, we would add to all cluster configurations: /dc/a/user=hdfs://dc-A-user-ns/user
  2. Similarly, for consistency, we duplicate all conventional links /path->hdfs://nameservice/path for C-DC as /DC/C/path->hdfs://nameservice/path. This allows us to use the same notation regardless of whether we work with the default C-DC cluster or a remote cluster.
  3. We can easily detect whether the configuration C’-DC’ that we are about to merge dynamically is a legacy Hadoop 1 cluster. For Hadoop 1, the key fs.defaultFS points to an hdfs:// URI, whereas for Hadoop 2, it points to a viewfs:// URI. Our Hadoop 1 clusters consist of a single namespace/NameNode, so we can transparently substitute the hftp scheme for the hdfs scheme and simply add the link: /DC/C’/->hftp://hadoop1nn/

Now the TwitterViewFs namespace is defined. However, at this stage ViewFs links pointing to hdfs nameservices cannot be used by the DFSClient yet. In order to make HA nameservice URIs resolvable, we need to merge the relevant HDFS client configuration from all hdfs-site.xml files in C’-DC’ directories. Here’s how we do this:

  1. HDFS uses the key dfs.nameservices to store a comma-separated list of all the nameservices DFSClient needs to resolve. We append the values of all C’-DC’ to the dfs.nameservices value of the current cluster. We typically have 3-4 namespaces per cluster.
  2. All namespace-specific parameters in HDFS carry the namespace somewhere in the suffix. Twitter namespace names are unique and mnemonic enough that a simple heuristic of copying all key-value pairs from C’-DC’ where the key name begins with “dfs” and contains one of the nameservices from Step A is sufficient.

Now we have a working TwitterViewFs with all clusters accessible via the /DC/C/path convention regardless of whether a specific C is a Hadoop 1 or a Hadoop 2 cluster. A powerful example of this scheme is to check the quota of home directories on all clusters in one single command: hadoop fs -count ‘/{dc1,dc2}/*/user/gera’

We can also easily run fsck on any of the namespaces without remembering the exact complex URI: hadoop fsck /dc1/c1/user/gera

We want a consistent experience when working with the local filesystem and HDFS. It is much easier to remember conventional commands such as cp than “syntactic-sugar commands” such as copyFrom/ToLocal, put, get, etc. A regular hadoop cp command requires a full file:/// URI and that is what the syntactic sugar commands try to simplify. When mounted with ViewFs even this is not necessary. Similar to how we add ViewFs links for the cluster /DC/cluster, we add ViewFS links to the TwitterViewFs configuration such as:

Then, copying a file from a cluster to a local directory looks like:
hadoop fs -cp /user/laurent/debug.log /local/user/laurent/

The simple, unified cross-DC view on an otherwise fragmented Hadoop namespace has pleased internal users and sparked public interest.

High availability for multi-datacenter environment
Beyond this, we created a project code-named Nfly (N as in N datacenters), where we implement much of the HA and multi-dc functionality in ViewFs itself in order to avoid unnecessary code duplication. Nfly is able to link a single ViewFs path to multiple clusters. When using Nfly one appears to interact with a single filesystem while in reality in the background each write is applied to all linked clusters and a read is performed from either the closest cluster (according to NetworkTopology) or the one with the most recent available copy. Nfly makes cross-datacenter HA very easy. Fusion of multiple physical paths to one logical more available path is achieved with a new replication multi-URI Inode. This is tailored to a common HDFS usage pattern in our highly available Twitter services. Our services host their data on some logical cluster C. New service data versions are created periodically to relatively infrequently and read by many different servers. There is a corresponding HDFS cluster in multiple datacenters. When the service runs in datacenter DC1 it prefers to read from /DC1/C for lower latency. However, when data under /DC1/C is unavailable the service wants to failover its reads to the higher latency path /DC2/C instead of exposing the outage to its users.

A conventional ViewFs mount direct inode points to a single URI via ChRootedFileSystem, as you can see there is one arrow between nodes in Figure 3 above. The user namespace (which is green above) of ClusterA in datacenter DC1 is mounted using the mount point entry /DC1/clusterA/user->hdfs://dc1-A-user/user. When the application passes the path /DC1/clusterA/user/lohit it will be resolved as follows. The root portion of the path marked bold between the root / and the mount point inode user (top of the namespace tree in Figure 3) is replaced by the link target value hdfs://dc1-A-user/user. Then the result hdfs://dc1-A-user/user/lohit is used to access the physical FileSystem. Replacing of root portion is called chrooting in this context, hence the name ChRootedFileSystem. Thus, if we had multiple URI’s in the inode, we could back a single logical path by multiple physical filesystems typically residing in different datacenters.

Consequently, we introduce a new type of link pointing to a list of URIs each wrapped in a ChRootedFileSystem. The basic principle that a write call is propagated to each filesystem represented by the URIs synchronously. On the read path, the FileSystem client picks the URI pointing to the closest destination, such as in the same datacenter. A typical usage is /nfly/C/user->/DC1/C/user,/DC2/C/user,… The message sequence diagram in Figure 4 illustrates this scenario.

This collection of ChRootedFileSystem instances is fronted by the Nfly filesystem object that is used for the mount point inode. The Nfly filesystem backs a single logical path /nfly/C/user//path by multiple physical paths. It supports setting minReplication. As long as the number of URIs on which an update has succeeded is greater than or equal to minReplication, exceptions are merely logged but not thrown. Each update operation is currently executed serially. However, we do plan to add a feature to use parallel writes from the client as far as its bandwidth permits.

With Nfly a file create or write is executed as follows:

  1. Creates a temporary invisible _nfly_tmp_file in the intended chrooted filesystem.
  2. Returns a FSDataOutputStream that wraps output streams returned by A.
  3. All writes are forwarded to each output stream.
  4. On close of stream created in B, all n streams are closed, and the files are renamed from _nfly_tmp_file to file. All files receive the same mtime corresponding to the client system time as of beginning of this step.
  5. If at least minReplication destinations have gone through steps A to D without failures the filesystem considers the transaction logically committed; Otherwise it tries to clean up the temporary files in a best-effort attempt.

As for reads, we support a notion of locality similar to HDFS /DC/rack/node. We sort URIs using NetworkTopology by their authorities. These are typically host names in simple HDFS URIs. If the authority is missing as is the case with the local file:/// the local host name is assumed InetAddress.getLocalHost(). This ensures that the local file system is always considered to be the closest one to the reader. For our Hadoop 2 hdfs URIs that are based on nameservice ids instead of hostnames it is very easy to adjust the topology script since our nameservice ids already contain the datacenter reference. As for rack and node we can simply output any string such as /DC/rack-nsid/node-nsid, because we are concerned with only datacenter-locality for such filesystem clients.

There are two policies/additions to the read call path that make it more computationally expensive, but improve user experience:

  1. readMostRecent – Nfly first checks mtime for the path under all URIs and sorts them from most to least recent. Nfly then sorts the set of URIs with the most recent mtime topologically in the same manner as described above.
  2. repairOnRead – Nfly already has to contact all underlying destinations. With repairOnRead, the Nfly filesystem would additionally attempt to refresh destinations with the path missing or a stale version of the path using the nearest available most recent destination.

As we pointed out before, managing ViewFs configurations can already be quite cumbersome, and Nfly mounts make it even more complicated. Luckily, TwitterViewFs provides mechanisms with sufficient flexibility to add more code in order to generate useful Nfly configurations “on the fly”. If a Twitter employee wants their home directories on the logical cluster C across all DC’s nflied under /nfly/C/user/, she simply specifies -Dfs.nfly.mount=C. If she then additionally wants to cache the files locally under /local/user//C, she specifies -Dfs.nfly.local=true.

Future work
The multi-URI inode introduced for Nfly lays the groundwork for the read-only Merge FileSystem that transparently merges inodes from the underlying filesystems. This is something we’re currently working on implementing. It will allow us to cut the number of mount table entries dramatically in comparison to single-URI inode approach. The target use case for the Merge FileSystem is to split an existing namespace, for example the user namespace, into two namespaces without the need for users to adjust code, and without bloating configuration. To see this illustrated, you can compare Figures 5 and 6.

In this post we shared our approach to managing Hadoop filesystems at Twitter: scaling to meet our needs for vast storage using federated namespaces while maintaining simplicity through ViewFs. We extended ViewFs to simplify its operation in face of ever growing number of clusters and namespaces in multiple datacenters and added Nfly for cross-datacenter availability of HDFS data. We believe that the broader Hadoop user community will benefit from our experience.

We would like to thank Laurent Goujon, Lohit VijayaRenu, Siqi Li, Joep Rottinghuis, the Hadoop team at Twitter and the wider Hadoop community for helping us scale Hadoop at Twitter.

Original URL:

Original article

Constitutions as Summer Reading

VOXConstituteProject3-300x291Two years ago my collaborators and I introduced a new resource for understanding constitutions. We call it Constitute. It’s a web application that allows users to extract excerpts of constitutional text, by topic, for nearly every constitution in the world currently in force. One of our goals is to shed some of the drudgery associated with reading legal text. Unlike credit card contracts, Constitutions were meant for reading (and by non-lawyers). We have updated the site again, just in time for summer (See below). Curl up in your favorite retreat with Constitute this summer and tell us what you think.

Some background: Constitute is built primarily for those engaged in the challenge of drafting constitutions, which occurs more frequently than some think (4-5 constitutions are replaced each year and many more are revised in smaller ways). Drafters often want to view examples of text from a representative set of countries – mostly so that they can understand the multiple dimensions of a particular area of law. Of course, scholars and educators will also find many uses for the data. After all, the resource grew out of an effort to study constitutions, not write them.

How does Constitute differ from other constitutional repositories? The core advantage of Constitute is the ability to view constitutional excerpts by topic. These topics are derived from the conceptual inventory of constitutions that my collaborators and I have been developing and refining over the last ten years as part of the Comparative Constitutions Project (CCP). The intent of that project is to record the content of the world’s constitutions in order to answer questions about the origins and effects of various constitutional provisions. In order to build that dataset (CCP), we invested quite a bit of time in (1) identifying when constitutions in each country had been enacted, revised, or replaced, (2) tracking down the texts associated with each of these changes, (3) digitizing and archiving the texts, (4) building the conceptual apparatus to extract information about their content, and finally, (5) reading and interpreting the texts. We leveraged all of this information in building Constitute.

We are committed to refining and elaborating Constitute. Our recent release includes some exciting developments, some of which I describe here.

Now in Arabic! Until now, Constitute’s texts have been in English. However, we believeVOX.Constitution_Tunisienne_2014.pdf (with some evidence) that readers strongly prefer to read constitutions in their native language. Thus, with a nod to the constitutional activity borne of the Arab Spring, we have introduced a fully functioning Arabic version of the site, which includes a subset of Constitute’s texts. Thanks here to our partners at International IDEA, who provided valuable intellectual and material resources.

Form and function. One distinction of Constitute is the clarity and beauty of its reading environment. Constitutional interpretation is hard enough as it is. Constitute’s texts are presented in a clean typeset environment that facilitates and invites reading, not sleep and irritability. In the latest release, we introduce a new view of the data — a side-by-side comparison of two constitutions. While in our usual “list view,” you can designate up to eight constitutions for inclusion in the comparison set, once in “compare view,” you can choose any two from that set for side-by-side viewing. In compare view, you’ll find our familiar search bar and topic menu in the left panel to drive and refine the comparison. By default, compare view displays full constitutions with search results highlighted and navigable (if there are multiple results). Alternatively, you can strip away the content and view selected excerpts in isolation by clicking the button at the right of the texts. It is an altogether new, and perhaps better, way to compare texts.

Sharing and analyzing. Many users will want to carve off slices of data for digestion elsewhere. In that sense, scholars and drafting committees alike will appreciate that the site was built by and for researchers. Exporting is painless. Once you pin the results, you can export to a .pdf file or to Google Docs to collaborate with your colleagues. You can also export pinned results to a tabulated .csv file, which will be convenient for those of you who want to manage and analyze the excerpts using your favorite data applications. Not only that, but our “pin search” and “pin comparison” functions allow analysts to carve large slices of data and deposit them in the Pinned page for scaled-up analysis.

Raw data downloads. For those of you who build web applications or are interested in harnessing the power of Linked Data, we have exposed our linked data as a set of downloads and as a SPARQL endpoint, for people and machines to consume. Just follow the Data link on “More Info” in the left panel of the site.

And then there is “deep linking,” so that you can export your pinned results and share them as documents and datafiles. But you can also share excerpts, searches, comparisons, and full constitutions very easily in your direct communications. The most direct way is to copy the URL. All URLs on the site are now deep links, which means that anything you surface on the site is preserved in that URL forever (well, “forever” by internet standards). Suppose you are interested in those constitutions that provide for secession (Scotland and Catalunya have many thinking along those lines). Here are those results to share in your blog post, email, Wikipedia entry, or publication. By the way, do you know which constitutions mention the word “internet?” Chances are you’ll be surprised.

So, please take Constitute with you to the beach this summer and tell us what you think. Any comments or suggestions to the site should be directed to our project address,
Zachary Elkins is Associate Professor at the University of Texas at Austin. His research interests include constitutional design, democracy, and Latin American politics. He co-directs the Comparative Constitutions Project.


VoxPopuLII is edited by Judith Pratt. Editors-in-Chief are Stephanie Davidson and Christine Kirchberger, to whom queries should be directed.

Original URL:

Original article

Learn to Make Windows 10 Apps With This Free Course From Microsoft

With Windows 10, Microsoft introduced Universal Apps, that you can write once and run on everything from laptops to a Raspberry Pi. Now, the company has a free course to learn how to write them.

Read more…

Original URL:

Original article

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑

%d bloggers like this: