Avatar of admin

by admin

OpenNebula: Native GlusterFS Image Access for KVM Drivers

March 10, 2014 in Syndicated

If you saw our Gluster Spotlight (“Integration Nation”) last week, you’ll recall that Javi and Jaime from the OpenNebula project were discussing their recent advances with GlusterFS and libgfapi access. Here’s a post where they go into some detail about it:

The good news is that for some time now qemu and libvirt have native support for GlusterFS. This makes possible for VMs running from images stored in Gluster to talk directly with its servers making the IO much faster.

In this case, they use GFAPI for direct virtual machine access in addition to the FUSE-based GlusterFS client mount for image registration as an example of using the best tool for a particular job. As they explain, OpenNebula administrators expect a mounted, POSIX filesystem for many operations, so the FUSE-based mount fits best with their workflow while GFAPI works when lower latency and better performance are called for.

Read the full post here.

The GFAPI integration is slated for the 4.6 release of OpenNebula. To get an early look at the code, check out their Git repository. Documentation is available here.

Avatar of admin

by admin

Gluster Spotlight: Integration Nation

March 6, 2014 in Syndicated

This week’s spotlight will be all about software integrated with storage services. GFAPI has opened the floodgates for this type of integration with GlusterFS. In this spotlight, we’ll hear from people who have been actively working on integrations with Apache CloudStack, Pydio, and OpenNebula.

Hear about how they integrated with GlusterFS and they would suggest to others who wish to deploy any application stack with scale-out storage requirements.

As usual, you can request to be part of the live hangout, or follow along on YouTube. Q&A will be managed from the IRC channel #gluster-meeting.

Avatar of admin

by admin

Deploying Pydio in AWS with GlusterFS

March 6, 2014 in Syndicated

(This was originally published at the pyd.io web site)

Introduction

Deploying Pydio in a highly-demanding environment (lots of users, tons of documents) to achieve a dropbox-like server at scale requires a solid and elastic architecture.

As a distributed file-system and software-defined storage, GlusterFS is a low-cost way of providing robust storage architecture on standard hardware. On its side, having kept the FileSystem driver at its core since the beginnings of the project, Pydio is a perfect match to be deployed on top of Gluster, to provide user-friendly features and enterprise-grade security.

Architecture

The principle here is to provide High Availability and Scalability combining GlusterFS (for the storage part) and Pydio (for the access part) through a load-balanced cluster of nodes.

We choose here to install Pydio ( = compute ) and the Gluster bricks ( = storage) on the same instances, but every configuration can be imagined : N dedicated nodes for storage, and a subset of them running Pydio, or none of them running Pydio and K nodes of compute, etc.

Also, we choose to set up two Gluster volumes (each of them assembling 2p bricks), for an easier maintenance: one will contain some Pydio shared configurations, allowing the startup of a new Pydio node without hassle, and one will contain the actual users data (files). On EC2, we will use EBS volumes as primary bricks for the data gluster volume, and instances available disk space for the configs gluster bricks. Finally, a DB must be set up to receive all the annex Pydio data (namely users and ACLs, event logs, etc). This DB can be running on another instance, or eventually installed on one of the nodes. It should be replicated and backed-up for a better failover scenario.

The following schema shows an overview of the targeted architecture.

ArchitectureSchema.002-Gluster

Launch Instances

Create two (or four) EC2 instances, attaching to each an EBS of X Gb depending on the size you require. We chose Ubuntu 12.04 as the OS. Make sure to use a quite open security group, we’ll restrict permissions later. Instances will start with both PRIVATE and PUBLIC ips/dns. Update apt package lists with sudo apt-get update

GlusterFS Setup

Prepare Gluster bricks

We’ll use one for the actual data, and one for Pydio configurations data

$ sudo apt-get install glusterfs-server xfsprogs

$ sudo mkfs.xfs /dev/xvdb $ sudo mkdir /mnt/ebs

$ sudo mount /dev/xvdb /mnt/ebs

And add the line to /etc/fstab to automount at startup

/dev/xvdb       /mnt/ebs        xfs defaults    0 0

Let’s also create a dedicated folder for the configs volume, on both nodes

$ sudo mkdir /var/confbrick

Create and start the volumes

Recognize nodes each other

On node 1

$ sudo gluster peer probe PRIVATE2

On node 2

$ sudo gluster peer probe PRIVATE1

$ sudo gluster volume create pydio-data replica 2 transport tcp PRIVATE1:/mnt/ebs PRIVATE2:/mnt/ebs

$ sudo gluster volume create pydio-config replica 2 transport tcp PRIVATE1:/var/confbrick PRIVATE2:/var/confbrick

sudo gluster volume start pydio-data

sudo gluster volume start pydio-config

Mount the volumes on both nodes

If not already installed,

$ sudo apt-get install glusterfs-client

Create folders /mnt/pydio-config and /mnt/pydio-data

Edit /etc/fstab again, add in each node the following lines

PRIVATE1:/pydio-data /mnt/pydio-data glusterfs defaults,_netdev 0 0

PRIVATE1:/pydio-config /mnt/pydio-config glusterfs defaults,_netdev 0 0

Then remount everything $ sudo mount -a

Verify everything is mounted :  $ df -h

ubuntu@ip-10-62-94-160:/mnt/ebs$ df -h
Filesystem                                                Size  Used Avail Use% Mounted on
/dev/xvda1                                                7.9G  939M  6.6G  13% /
udev                                                      1.9G   12K  1.9G   1% /dev
tmpfs                                                     751M  168K  750M   1% /run
none                                                      5.0M     0  5.0M   0% /run/lock
none                                                      1.9G     0  1.9G   0% /run/shm
/dev/xvdb                                                 10G   33M   10G   1% /mnt/ebs
PRIVATE1:/pydio-data                                      10G   33M   10G   1% /mnt/pydio-data
PRIVATE1:/pydio-config                                    7.9G  939M  6.6G  13% /mnt/pydio-config

Make sure the webserver will be able to use these two folders

$ sudo chown -R www-data: /mnt/pydio-data

$ sudo chown -R www-data: /mnt/pydio-config

Now touch a file on one node and verify it’s on the other side.

Set up DB

For example on Node 1

sudo apt-get install mysql-server

Set up a root password, and allow MySQL to listen to external connexions: comment out following line in /etc/myslq/my.cnf

#bind-address           = 127.0.0.1

Using the EC2 PUBLIC address in the Pydio Config

Create a database
mysql> create database pydio;
mysql> grant all privileges on pydio.* to 'pydio'@'%' with grant option;

(Make sure to add a password, or update password at the end, otherwise it creates users with empt password)

Deploy pydio

First Node

Get the script from https://raw.github.com/ajaxplorer/ajaxplorer-core/master/dist/scripts/glusterfs/pydio-gluster.sh and run it as root.

$ wget https://raw.github.com/ajaxplorer/ajaxplorer-core/master/dist/scripts/glusterfs/pydio-gluster.sh
$ chmod u+x pydio-gluster.sh
$ ./pydio-gluster.sh

Once finished, start or restart apache
$ apachectl start
Go to the public IP of the node through a web-browser http://PUBLIC_IP1/pydio/, and follow the standard installation process. Setup admin login and global options, and for the Configurations Storage, choose Database  > Mysql , and use the public IP of the DB node as server host.

Installation Process

Installation Process

Then save an connect as admin, switch to the « Settings » workspace, and do some customization as you like in the configuration. You can activate some additional plugins, customize logo and application title, etc. The interesting part of doing that now is that any changes will be automatically reported to the other nodes you switch on.

Settings Panel

Settings Panel Sample

 

Second Node

As they will share their base configuration through the gluster pydio-config volume, the next nodes will directly inherit from the first node configs. So to add fire a new node, all you will have to do will be the script part:

$ wget https://raw.github.com/ajaxplorer/ajaxplorer-core/master/dist/scripts/glusterfs/pydio-gluster.sh
$ chmod u+x pydio-gluster.sh
$ ./pydio-gluster.sh

Then verify that pydio is up and running, and that you can log in with the same credentials, at http://PUBLIC_IP2/pydio/

Load Balancer

AWS LoadBalancer

We could use a custome compute node equiped with HAProxy or some similar software, but as our tutorial is running on AWS, we will use the available service to that: LoadBalancer. In your AWS console, create a LoadBalancer, forwarding port 80 to instances port 80.

Creating a LoadBalancer

Creating a LoadBalancer

To configure how healthcheck will be performed (how does the LB check that instances are alive), make sure to change the name of the file checked to check.txt. It is important because thanks to our install scripts, the nodes Apache servers are configured to skip the log of calls to this file, to avoid filling the logs with useless data (happening every 5s).

NOTE If you have an SSL certificate, which is definitely a good security rule, you will install it on this LoadBalancer, and redirect port 443 to 80: internal communications do not need to be encrypted.

Session Stickyness

Once edited and created, edit the « Stickyness » parameter of the redirection rules and choose « Enable Application Generated Cookie Stickyness », using « AjaXplorer » as cookie name. This is important, as although clients will be randonly redirected to instances on first connexion, once a session is established, it will always stay on a given instance.

Session Stickyness

Session Stickyness

 

NOTE Session stickyness avoid us to set up a session-sharing mechanism between nodes, but this could be done for example adding a memcache server.

Outside world address

Now that our various nodes will be accessed through a proxy and not through their « natural » public IP, we need to inform Pydio of that. This is necessary to generate correct sharing URLs, or sending emails pointing to the correct URL. Without that, Pydio would try to auto-detect the IP, and would probably end up displaying the PRIVATE IP of the current working node.

Login as admin to Pydio, and go the Settings > Global Configurations > Pydio Main Options. Here, update the fields Server URL and Download URL with the real addresses, and save. Go to a file workspace and try to share a file or a folder, and verify the link is correct and working.

Pydio Main Options, updated with Load Balancer address

Pydio Main Options, updated with Load Balancer address

Conclusion: adding new nodes on-demand

Well, that’s pretty much. We could refine this architecture on many points, but basically you’re good to go.

So what do you do to add a new node? Basically you’ll have to

[if you need more storage]

 

  1. Fire up a new instance with the ubuntu OS
  2. Configure Gluster to add it as a new brick to the volume

[if you need more compute]

  1. Fire up a new instance with the ubuntu OS
  2. Configure the gluster client to mount the volumes,
  3. Run the Pydio script to deploy and load configs
  4. Add this node to the LoadBalancer instances list.

Wishing you a happy scaling!

Avatar of admin

by admin

Vote for Gluster-related OpenStack Summit Talks

February 28, 2014 in Syndicated

Here are the Gluster-related abstracts that have been submitted for the OpenStack Summit in May in Atlanta. Check them out and vote!

  • Use Case: OpenStack + GlusterFS on TryStack.org
    • “The Gluster community has made huge strides to support backing an openstack installation’s storage with GlusterFS. TryStack.org has implimented GlusterFS as it’s storage backend.

      In this presentation we’ll will walk through the configuration details to impliment GlusterFS as OpenStack’s storage backend.”

  • A Technical Tour of OpenStack Swift Object Storage Volume Extensions
    • “Take developers through a tour of existing DiskFile backends for OpenStack Object Storage (Swift). The DiskFile interface in Swift is an API for changing how objects are stored on storage volumes. Swift provides a default implementation over XFS (Posix) and a reference in-memory example to help folks get started.”
  • Manila: The OpenStack File Share Service – Technology Deep Dive
    • “This presentation introduces Manila, the new OpenStack File Shares Service. Manila is a community-driven project that presents the management of file shares (e.g. NFS, CIFS) as a core service to OpenStack. Manila currently works with NetApp, Red Hat Storage (GlusterFS) and IBM GPFS (along with a reference implementation based on a Linux NFS server).”
  • Sharing GlusterFS Storage Servers with Openstack Compute nodes via Docker
    • “The main focus of this session will be to explain how Docker can be leveraged to utilize unused cycles on GlusterFS Storage nodes for additional compute nodes in an Openstack environment. Docker is an application container and can host both GlusterFS Storage node as well as Openstack compute nodes in a single physical server.”
  • Best practices for deploying and using Gluster for Storage in OpenStack environments
    • “Gluster has a number of exciting new features such as NUFA (Non Uniform File Access), Granular geo-replication, Unified file, block & object storage access and data tiering.

      In this presentation we discuss these new features and introduce best practices based on our own expereineces as well as that of customers for deploying and using Gluster in OpenStack environments.”

  • Extending GlusterFS for OpenStack
    • “There is a need to extend GlusterFS storage availability to other Operating Systems and Hyper-visors. In this session, you will learn about a generalized block solution for Gluster that works for any block-based application (Xen, HyperV, VirtualBox, VmWare, tape). We will compare different interconnect choices between the GlusterFS server and openstack client, such as iSCSI, FcOE, and ‘gluster native’.”
  • Breaking the Mold with OpenStack Swift and GlusterFS
    • “Red Hat uses OpenStack Swift as the object storage interface to GlusterFS. Instead of reimplementing the Swift API, Red Hat is participating in the OpenStack Swift community to ensure that GlusterFS can take full advantage of the latest Swift features. This is absolutely the right way to pair Swift with another storage system.”

 

Avatar of admin

by admin

The Rise of Open Source Analytics Software

February 25, 2014 in Syndicated

I was pleased to read about the progress of Graylog2, ElasticSearch, Kibana, et al. in the past year. Machine data analysis has been a growing area of interest for some time now, as traditional monitoring and systems management tools aren’t capable of keeping up with All of the Things that make up many modern workloads. And then there are the more general purpose, “big data” platforms like Hadoop along with the new in-memory upstarts sprouting up around the BDAS stack. Right now is a great time to be a data analytics person, because there has never in the history of computing been a richer set of open source tools to work with.

There’s a functional difference between what I call data processing platforms, such as Hadoop and BDAS, and data search presentation layers, such as what you find with the ELK stack (ElasticSearch, Logstash and Kibana). While Hadoop, BDAS, et al. are great for processing extremely large data sets, they’re mostly useful as platforms for people Who Know What They’re Doing (TM), ie. math and science PhDs and analytics groups within larger companies. But really, the search and presentation layers are, to me, where the interesting work is taking place these days: it’s where Joe and Jane programmer and operations person are going to make their mark on their organization. And many of the modern tools for data presentation can take data sets from a number of sources: log data, JSON, various forms of XML, event data piped directly over sockets or some other forwarding mechanism. This is why there’s a burgeoning market around tools that integrate with Hadoop and other platforms.

There’s one aspect of data search presentation layers that has largely gone unmentioned. Everyone tends to focus on the software, and if it’s open source, that gets a strong mention. No one, however, seems to focus on the parts that are most important: data formats, data availability and data reuse. The best part about open source analytics tools is that, by definition, the data outputs must also be openly defined and available for consumption by other tools and platforms. This is in stark contrast to traditional systems management tools and even some modern ones. The most exciting premise of open source tooling in this area is the freedom from the dreaded data roach motel model, where data goes in, but it doesn’t come out unless you pay for the privilege of accessing the data you already own. Recently, I’ve taken to calling it the “skunky data model” and imploring people to “de-skunk their data.”

Last year, the Red Hat Storage folks came up with the tag line of “Liberate Your Information.” Yes, I know, it sounds hokey and like marketing double-speak, but the concept is very real. There are, today, many users, developers and customers trapped in the data roach motel and cannot get out, because they made the (poor) decision to go with a vendor that didn’t have their needs in mind. It would seem that the best way to prevent this outcome is to go with an open source solution, because again, by definition, it is impossible to create an open source solution that creates proprietary data – because the source is open to the world, it would be impossible to hide how the data is indexed, managed, and stored.

In the past, one of the problems is that there simply weren’t a whole lot of choices for would-be customers. Luckily, we now have a wealth of options to choose from. As always, I recommend that those looking for solutions in this area go with a vendor that has their interests at heart. Go with a vendor that will allow you to access your data on your terms. Go with a vendor that gives you the means to fire them if they’re not a good partner for you. I think it’s no exaggeration to say that the only way to guarantee this freedom is to go with an open source solution.

Further reading:

 

Avatar of admin

by admin

Gluster Spotlight: Citrix, FASRC, Avati and Theron

February 5, 2014 in Syndicated

Join us on Friday, February 7, 1pm Est/10am PST/18:00 Gmt for a very special Gluster Spotlight featuring our 4 new board members: James Cuff (Harvard FASRC), Mark Hinkle (Citrix), Anand Avati (Red Hat – individual contributor) and Theron Conrey (individual contributor).

As always, you can watch the video feed here and ask questions on the #gluster-meeting IRC channel on the Freenode network. See you there!

Avatar of admin

by admin

Citrix and Harvard FASRC Join Gluster Community; Board Expands

February 5, 2014 in Syndicated

Citrix, Harvard University FASRC and long-time contributors join the Gluster Community Board to drive the direction of open software-defined storage

February 5, 2014 – The Gluster Community, the leading community for open software-defined storage, announced today two new organizations have signed letters of intent to join: Citrix, Inc. and Harvard University’s Faculty of Arts and Science Research Computing (FASRC) group. This marks the third major expansion of the Gluster Community in governance and projects since mid-2013. Monthly downloads of GlusterFS have tripled since the beginning of 2013, and traffic to gluster.org has increased by over 50% over the previous year. There are now 45 projects on the Gluster Forge and more than 200 developers, with integrations either completed or in the works for Swift, CloudStack, OpenStack Cinder, Ganeti, Archipelago, Xen, QEMU/KVM, Ganesha, the Java platform, and SAMBA, with more to come in 2014.

Citrix and FASRC will be represented by Mark Hinkle, Senior Director of Open Source Solutions, and James Cuff, Assistant Dean for Research Computing, respectively, joining two individual contributors: Anond Avati, Lead GlusterFS Architect, and Theron Conrey, a contributing speaker, blogger and leading advocate for converged infrastructure. Rounding out the Gluster Community Board are Xavier Hernandez (DataLab); Marc Holmes (Hortonworks), Vin Sharma (Intel), Jim Zemlin (The Linux Foundation), Keisuke Takahashi (NTTPC), Lance Albertson (The Open Source Lab at Oregon State University), John Mark Walker (Red Hat), Louis Zuckerman, Joe Julian, and David Nalley.

Citrix

Citrix has become a major innovator in the cloud and virtualization markets. They will drive ongoing efforts to integrate GlusterFS with CloudStack (https://forge.gluster.org/cloudstack-gluster) and the Xen hypervisor. Citrix is also sponsoring Gluster Community events, including a Gluster Cloud Night at their facility in Santa Clara, California on March 18.

Harvard FASRC

The research computing group at Harvard has one of the largest known deployments of GlusterFS in the world, pushing GlusterFS beyond previously established limits. Their involvement in testing and development has been invaluable for advancing the usability and stability of GlusterFS.

Anand Avati

Anand Avati was employee number 3 at Gluster, Inc. in 2007 and has been the most prolific contributor to the GlusterFS code base as well as its most significant architect over the years. He is primarily responsible for setting the roadmap for the GlusterFS project. Avati is employed by Red Hat but is an individual contributor for the board.

Theron Conrey

Theron became involved in the Gluster community when he started experimenting with the integration between oVirt (http://ovirt.org/) and GlusterFS. Long a proponent of converged infrastructure, Theron bring years of expertise from his stints at VMware and Nexenta.

Supporting Quotes

John Mark Walker, Gluster Community Leader, Red Hat

The additions of Citrix and Harvard FASRC to the Gluster Community show that we continue to build momentum in the software-defined storage space. With the continuing integration with all cloud and big data technologies, including the Xen Hypervisor and CloudStack, we are building the default platform for modern data workloads.

Mark Hinkle, Senior Director, Open Source Solutions, Citrix

“We see an ever increasing hunger for storage solutions that have design points that mirror those in our open source and enterprise cloud computing efforts. Our goal is to enable many kinds of storage with varying levels of utility and we see GlusterFS as helping to pioneer new advances in this area. As an active participant in the open source community we want to make sure projects that we sponsor like Apache CloudStack and the Linux Foundation’s Xen Project are enabled to collaborate with such technologies to best serve our users.

James Cuff, Assistant Dean for Research Computing, Harvard University

As long term advocates of both open source, and open science initiatives at scale, Research Computing are particularly excited to participate on the Gluster Community Governing Board. We really look forward to further accelerating science and discovery through this important and vibrant community collaboration.”

 

***The OpenStack mark is either a registered trademark/service mark or trademark/service mark of the OpenStack Foundation, in the United States and other countries, and is used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community

***Gluster and GlusterFS are trademarks of Red Hat, Inc.

***Xen and Linux are trademarks of The Linux Foundation

***Apache Cloudstack is a trademark of the Apache Software Foundation

Avatar of admin

by admin

Gluster Cloud Night Amsterdam

February 3, 2014 in Syndicated

Join us on March 4 for the Gluster Community seminar and learn how to improve your storage.
This half day seminar brings you in-depth presentations, use cases, demos and developer content presented by Gluster Community experts.

REGISTRATION

Register today for this free half-day seminar and reserve your seat since spaces are limited. Click here to register.

We look forward to meeting you on March 4th!

AGENDA

13:30 – 13:45 Registration
13:45 – 14:15 The State of the Gluster Community
14:15 – 15:30 GlusterFS for SysAdmins, Niels de Vos, Red Hat
15:30 – 15:45 Break
15:45 – 16:30 Adventures in Cloud Storage with OpenStack and GlusterFS
Tycho Klitsee (Technical Consultant and Co-owner) of Kratz Business Solutions
16:30 – 17:15 Gluster Forge Demos, Fred van Zwieten, Technical Engineer, VX Company and Marcel Hergaarden, Red Hat
17:15 – 18:00 Networking Drinks

Avatar of admin

by admin

Gluster Spotlight on James Shubin: Puppet-Gluster, Vagrant and GlusterFS Automation

January 20, 2014 in Syndicated

***UPDATE: Due to weather-related flight cancelations and rebooking, we had to push this back to Thursday, January 23, at noon PST/3pm EST/20:00 GMT***

James Shubin is known in the Gluster community for his work on the Puppet-Gluster module.

Recently, he’s begun to create powerful cocktails of Puppet and Vagrant to create recipes for automated Gluster deployments. See, eg.

Building Base Images for Vagrant with a Makefile

and

Testing GlusterFS During GlusterFest

This will be quite a fun spotlight, and very much worth your while. As usual, join the #gluster-meeting channel on the Freenode IRC network to participate in the live Q&A.

About Gluster Spotlight

Gluster Spotlight is a weekly Q&A show featuring the most exciting movers and shakers in the Gluster Community. If you don’t catch them live, you can always watch the recordings later.

Avatar of admin

by admin

GlusterFest Weekend is Here – Jan 17 – 20

January 16, 2014 in Syndicated

As I mentioned yesterday, the GlusterFest is nigh. This time, we’ll break out testing into two types:
  • Performance testing
  • Feature testing
To learn about the GlusterFest and what it is, visit the GlusterFest home at gluster.org/gfest
Remember that if you file a bug that is verified by the Gluster QE team, you’ll win a t-shirt plus other swag.

PERFORMANCE TESTING

We are lucky in that two individuals have stepped up with tools to help with performance testing. One is James Shubin with his Puppet-Gluster module:
https://forge.gluster.org/puppet-gluster/

Together with his blog posts on puppet-gluster + vagrant, you should have an easy way to deploy GlusterFS:
Automatically deploying GlusterFS with Vagrant and Puppet

Also, Ben England recently released some code for his Smallfile performance testing project, which targets metadata-intensive workloads:

http://forge.gluster.org/smallfile-performance-testing

He also wrote up a nice primer on performance testing on the Gluster.org wiki that discusses iozone, smallfile, and how to utilize performance testing in general:
http://www.gluster.org/community/documentation/index.php/Performance_Testing
Please follow the instructions on the GlusterFest page (gluster.org/gfest) and report your results there. Some of the test results are quite large, so you will want to report test results on a separate page, either on the Gluster.org wiki or on the paste site of your choosing, such as fpaste.org.
Please file any bugs and report them on the gluster-devel list, as well as providing links on the GlusterFest page.

FEATURE TESTING

In addition to performance, we have new features in 3.5 which needs some further testing. Please follow the instructions on the GlusterFest page and add your results there. Some of the developers were kind enough to include testing scenarios with their feature pages. If you want your feature to be tested but didn’t supply any testing information, please add that now.

The GlusterFest begins at 00:00 GMT/UTC (today, January 17) and ends at 23:59 GMT/UTC on Monday, January 20.
Rev your engines and get ready for some testing!