Steffens side notes..

Friday, February 5, 2016

Reflections on the XMPP Summit 2016

I 've been writing most of this in the airplane. I have to because my head is full of ideas and new thoughts are buzzing! Now I finally had the time to write the last bit and parts which was stuck in my brain.

So I arrived in Brussels Last Wednesday (27th Jan.) so I could get up early the next day for the XMPP summit. We were split out on different hotels, so it was a bit messy and I though I had to be by my self all Wedensday night, but happily +Winfried Tilanus showed up for a couple of beers and so the XMPP discussion and summit began... Yeah!

The Summit

The Summit started thursday and we were not many but the discussions and presentations that where there was really good and very relevant for the future of XMPP. Even though my head was not that fresh (thanks for the beers +Winfried Tilanus - fun as always) - I quickly got psyched about the agenda. +Kevin Smith banged the gravel, lead the and we began the summit..

+Dave Cridland started out by presenting the account model (PAM) and an intro to MIX (see XEP-0369: Mediated Information eXchange (MIX). I really like the idea behind mix: to have a context around a mediating service where you can subscribe to multiple types of nodes, like: message, presence, participants etc. It really serve as a multi purpose service, but a low hanging fruit to take would of course be MUC (multi user chat) - which would be easy to model through MIX and Pub/Sub (http://www.xmpp.org/extensions/xep-0060.html).

After some longer discussions about MIX, Winfried then talked about his new and potential huge project regarding the mental health industry and XMPP. Very inspiring to hear about new real life projects coming up. I like the idea of open communities, open distributed data and social network.. It seems like a rather huge task, but an interesting one! :-)

During the friday we started discussed about making faster reconnects from the client to the server and making fewer roundtrips.. I and many other know that a few implementations already do all kind of tricks to make it simpler and faster - but this was about to standardise it.

We then started to talk about the never ending story: end-to-end encryption (E2E). We talked a bit how to tackle this subject for a while and then +Dave Cridland threw in an interesting idea. He's been reading up on some papers and discussed with some security people about a crypto system called Proxy re-encryption. He proposed the idea and showed how it could be used to sole the E2E problems in XMPP. It was really interesting but also state-of-the art. I personally hope that some will implement it and try it out this year even though no standard exists yet. As I recall it, this solution would be working if we will start to use PAM and move the account state to the server and not the client. So both PAM and Proxy re-encryption is quite new, we won't probably see it before the end of this year - but hopefully people will start to experiment!

After E2E and MIX we then split up and talked a bit the two biggest issues at the summit: MIX (again) and E2E encryption and the different mechanisms. I followed the E2E discussion where we taked bit about OTR and OMEMO and how it would fit in different scenarios. I am not that into crypto standard, so I did not contribute that much, but I followed the discussion with quite an interest.

So besides the official summit talk which I alway love to mingle and to be inspired. So under the summit I started to implement the push notification extension for the Tigase server - something I actually needed ad something that was not still available for Tigase. It's almost done - and I'll make a PR to the tigase people when well tested so you guys can enjoy it as well. :-)

I also mingled, talked and got inspired from the really nice guys from mongooseIM (+Michał Piotrowski and +Nicolas Vérité ) and +Tobias M. We talked a bit about clustering, load balancing and load testing in XMPP systems, because its a bit of my favourite topics for XMPP. My idea was just to make a very simple load balancer based on the from attribute in the initial stream. This would at least put every of your clients onto the same server and reduce the traffic a bit (cluttering wise). I also have a customer which only uses communication between its own clients (IoT) and here if I put this load balancer we would not even have to use a XMPP server in a cluster. So for now I have only investigated how to implement this into HA-Proxy or Nginx (which is my preferred proxies). Next step is to implement it.

Even though the meetup in the summit was about 15 and not the normal 30-40 people we still got a lot of really good discussion and some real work done (proto XEPs etc). Good karma to all of you guys from me to you!

Next post will be about FOSDEM..

E2E Links

https://conversations.im/omemo/

https://en.wikipedia.org/wiki/Proxy_re-encryption

http://www.semper.org/sirene/people/gerrit/papers/divertproxy.pdf

https://isi.jhu.edu/~mgreen/prl/index.html

Friday, December 13, 2013

XMPP / IoT MEETUP in UK (2'nd December)

For the original arrangement that Survine (mostly Laura and Lloyd) did, the venue was at the MozillaHQ and the target group was mainly XMPP people with a sidetrack of IoT.

The Setup

Since I have done some IoT stuff with TVs and set-top boxes, I found it pretty obvious to do a talk about that, so a couple of days before the meet up I started to work on some ideas and some slides that I could present. I talked with Lloyd (the co-arrananger of the meet-up) and he liked the idea.

But the a couple of days before the meet-up, the venue changed. Mozilla cancelled our room, Survine had to find a another venue.

Laura and Lloyd then actually managed to get us into a big IoT event that was going on the same day (@thingmonk) and the day after.

Since the venue changed, the audience changed a lot - now the people that were coming for the event was IoT people and not firstly XMPP people. Me and Lloyd discussed if it wasn’t to be better if we were to present a general (but short) intro to XMPP. So my talked was cancelled. That was ok, even though it could have been fun to see peoples reactions that XMPP is already used in TVs and set-top boxes and the environment behind that.

So now I could relax the last days upon the meet up in London - preparing my weeding (so actually I did not have the time to do the presentation after all)

Going to London

So I’ve meet up with one of my old friends going to London the day before, sunday the 1’st of December.

Mostly for me personally it was not only about the presentations but more about to meet people and have a chat before and afterwards.

When me and my friend arrived Sunday morning, we wandered around, located our hotel, got some food and a drink. Later the same evening we met up with @JoachimLindborg (which was doing a presentation on XMPP and IoT). We had a couple of beers and some good talk about what XMPP could bring for the IoT people and some interesting perspective was discussed. (I a mostly inspired when I get a good beer) :-)

Monday we went to google campus,working a bit and darkening coffee. Met up with Lloyd and had a chat and worked a bit more.

The XMPP / IOT Meetup

The meet up was during the evening, so I reached to do a bit of work and research before we went of to the event.
At 17 o'clock we wandered of to Shoreditch (works village hall) where the @thingmonk event were residing. When we arrived we saw only candle lights from the out side. I thought that looked pretty cozy and a fun thing to do, but actually found out that it was caused by a power outage.. Damn!

>
People was suggesting all kind of thing to get us some electricity. We were even looking for a diesel engine. That could have been some paradox, when people in the same time was talking about IoT, green tech etc. :-)

We ended up with electricity from the neighbours (a local bar I think). So the tables was plastered with power cables and whatnot (see picture).

With the electricity going again (on a reduced level and candle lights) we could begin our little meetup.
The meetup started like any good meetup, with beer and burgers (gesture from survive).
So after chewing some burgers, chips and some beers we began with the presentations.
First presentation was from Lloyd, who did a quick tour of XMPP and how it in general stick together. The presentation can be found here.

Its actually pretty hard to compress a general intro of XMPP into a half hour slot - but Lloyd did a good job in that. Last time I tried to do a XMPP intro (see my presentation here) it lasted for 1.5 hours. :-) Maybe its just me that likes to talk and do hands on stuff. ha ha.
After Lloyds talk, the electrician guys finally came to the rescue and saved rest of the remaining day. They somehow got the electricity get going again. Hurray!!!

The timing couldn’t be better, joachim was on and started his presentation and we could finally attach some more devices (mainly Raspberry pi’s that Joachim had brought).
So the presentation was about how you can control and meter your connected devices. We had around 10 Raspberry pies connected at the meetup and some connected in Sweden (via. XMPP of course). So after some walkthrough of the architecture behind it and a presentation of some new XEP (XMPP extensions) for IoT, we started hacking some python.
Some of the stuff was running locally on the attendees machines and some on the raspebberys. People started connecting, and we noticed the light switching on and off in Sweden, people getting metering data back etc.
When the presentation was over peopled stayed and hacking for a bit over a beer (great porters by the way!). People exchanged ideas, asking questions and I hope got some understanding of what XMPP can bring IoT.
Personally I had most fun just talking with people about XMPP over a beer. I chatted a bit with Florian and Matthew around servers, and also running XMPP without server (link local stuff like the Bonjour protocol).

All in all it was a small but funny venue, but I think that the meetup of people could be even better - but I really think that just might have something to do with the power outage and the bit of chaotic start we had. :-)

Some of the stuff that Joachim presented and hacking we did can be found here:

I am still a bit eager to present my idea around medias (TV, set-top boxes etc) and second screen. There have been some proposals like the DIAL protocol etc. but most of them have some issues that actually already is solved in XMPP.

So I’ll think i’ll might try so sign up for a lignin talk at FOSDEM - regarding IoT - TVs and second screen (and the like) over XMPP. It could also be cool to talk with some of my fellow XSF guys around my ideas and implementations I have already done, to see if it could be formalised and standardised into a XEP.

-Cheers and see you in Brussels (and London?)!

/Steffen

@zooldk

Thursday, June 6, 2013

Couchbase - use it for Logging!

My Introduction to Couchbase

For a short introduction - I've been using Couchbase for almost half a year now, but I've actually been around that "grey" couch-area all back to 2008/2009 where I tried to play with CouchDB using it for document storage.

But I introduced couchbase in my current company (I am working as a consultant), because we wanted:

To store many JSON documents
To be fast!
Make secondary indexes on the arbitrary documents
Many clients should be able to fetch the documents - so caching is needed.
The database should be to failover some how
Easy to setup, scale and understand

We almost got everything we needed for our architecture, so we based this setup on couchbase and have now used it a couple of month in production. We have quite easily setup a cluster of 3 nodes and used that to take in large amount of metadata for a Video On Demand (VOD) workflow engine. We have tested it and we got quite good results of operations per seconds (both during writes and reads - we mainly write a lot in this scenario).

Logging - problem statement

The system my customer currently have is a broker/workflow system for VOD that integrates with many other systems and platform, which means a lot of stuff can go wrong. And this is not always easy to spot in the current system. So with a good logging mechanism I want it to be able to:

Application developers to be able to investigating incidents.
Customer support team investigating a lost transaction reported by a customer complaint.
The security team to do security forensics.
Easily spot trends and somekind of buisiness intelligence (BI) for the business managers demanding statistics

So I've been using couchbase for storing trivial document data and all is working out very fine indeed.

But then it hit me, why not use it for arbitrary logging? Couchbase database sure looked fast enough to store a massive load of documents coming in a short amount of time. And when we already have a cluster of couchbase nodes, it would be straight forward to do - and why not use what you already have.. But also my idea could be fun and giving!

The Couchbase Idea

The idea of using couchbase also gives me the ability to shape the log structure as I want it and dynamicaly change it - if I want!. This is one of the true powers of using a document storage database, such as couchbase. And the documents that are stored are firstly cached in memcached that couchbase automatically have in front - so that means speed when you are searching your logs!

Even though a lot of different solutions exists for logging, I've always had the feeling that they had some shortcomings. Firstly we have a cluster of application servers that are behind a load balancer. This means for ordinary logging on the file system, that our logs are spread around on each application server - where you have to login and look at the log - compare etc. This of course become more and more complicated the more application servers you add.

Many remote loggers do exists, these either makes their logging through a normal RDBMS database or a protocol for exchanging logs - such as syslog. I've had some problems using syslog, together with my environment and since we did not had a syslog server, it made it even harder to use in this scenario.

A lot of these log services store the logs for safe keeping and makes it available later. Which often means, store the logs in a centralised place and parse it later (eventually first when needed).

I wanted my logging to be instant, so when a new software is released, we can see the impact and eventual errors instantly! Also I wanted to be able to evict logs which have no significance over time (say info or other levels). In this case I will use the expire time that you can set for each item that you persist (both in memory and on disk).

Almost all of the Java companies I've been working for as a freelancer, have been using some sort of Log4j Library - meaning directly Log4J or a SL4J facade. This company is doing the same, using Log4J (version 1.2).

Log standards?

So instead of implementing yet another REST service where each piece of software on the application servers have to re-write their logging technique, why not just use the "standard" Java logging mechanism - like log4j in this case?

Speaking of standards, I would also like to have my log documents stored in some kind of standard. Coming from a open standard world (XMPP Advocate and XSF member) I know how much it actually means to comply to a standard so it makes is easy to interoperate. Making it so, will also make it easy to export data from other sources of loggers and import etc. No real standards really exists that are really widespread, except syslog, which I have tried to use a couple of times now.

Actually some shortcomings I've had with syslog and java was the length of a stacktrace. Stacktraces can be enormous and syslog sometimes can't handle that - so it does have some limitations! I dont like limitations, so a requirement to be able to "shape" my own log document structure or standards (if any) was very important as well.

Summary of requirements

So lets assemble my requirements of my idea of a centralised logger:

Easy to setup!
Log4j compliant (which makes it directly plugable from a java perspective)
Fast and distributed
Close to instant logging
Easy to shape the log documents
Easy to index the log documents as we go along(some incident reports, some want BI, some want forensics)
Some kind of standardised log format

Log4j-couchbase (https://github.com/zooldk/log4j-couchbase)

So as you can guess, I used couchbase for storage and implemented a Log4J appender, that would make it posible to use our current log4j setup and format the log document the way we wish.

Because I like standards and because I don't want to re-invent the wheel, I decided to go for Logstash as a "standard" logging format. I have researched it a bit, but I found only Logstash that looked simple and widely used enough for my purpose.

So in half a day or so I created the log4j-couchbase project. Its a simple log4j appender that formats the logs as Logstash (almost like). :-)

I've made some simple design assumptions, such that the log appender is adding the logs asynchronous, instead of waiting for a OK from couchbase. That is I am not using the get() method on my OperationFuture, so all my appender should be running in a non-blocking mode for performance. Later I might make this optional to turn on synchronous mode. But for now I find performance more important for logging than the robustness that synch. mode offers.

Usage

If you are using maven you should be able to add the following dependency to your pom.xml. Otherwise compile the project by doing a "mvn clean install" and put the jar file (located under target) in your classpath of your project.

<dependency>
<groupId>dk.braintrust.os.logger</groupId>
<artifactId>log4j-couchbase-logger</artifactId>
<version>0.4.0-SNAPSHOT</version>
</dependency>

I have uploaded the snapshot of version 0.4.0 to the maven central and it should be promoted soon to a released version. Until then you have to live with a snapshot version.

Now that the class is added to your project, you only need to add the log4j.properties file. This file will setup the properties for the couchbase appender. An example could be:

log4j.rootLogger=DEBUG, COUCHBASE
log4j.appender.COUCHBASE=dk.braintrust.os.logger.CouchBaseLogAppender
log4j.appender.COUCHBASE.hosts=localhost
log4j.appender.COUCHBASE.port=8091
log4j.appender.COUCHBASE.password=
log4j.appender.COUCHBASE.loggingBucket=default
log4j.appender.COUCHBASE.developmentMode=true
log4j.appender.COUCHBASE.eviction=0
log4j.appender.COUCHBASE.layout=dk.braintrust.os.logger.JsonEventLayout

Remember to substitute your couchbase server settings accordingly. Also remember to set your root appender to a console appender, if you would like to see what is happening on your console.

If you are using an eviction strategy like 0 (eviction=0), your logs will be persisted forever. If you are setting the value (in seconds) higher than zero, the document will be evicted after the given value of seconds. Eg. if eviction=60, the documents will be stored in only 60 seconds. In the next major version (1.0.0) of the log appender, I will try to add an even more clever eviction strategy, maybe based upon log level, occurrence etc.

After adding the property file, you are now ready to use the couchbase logger. So in your java project, just use your normal log4j statements and levels like so:

Log.warn("This is an warn!");
Log.error("Auch we got a stacktrace", exception);

A more detailed description of the log levels etc can be found at Log4J.

Dataformat

When the log statements is implemented like above and the property file is set correctly, the couchbase appender will now put all your log data in couchbase into the default bucket. If you want another bucket specific for your logs, you can set it via the property file (loggingBucket property). The log will be persisted in the couchbase data structure which will look something like this:

{
"message": "Auch we got a stacktrace", "hostname": "58b03572d12c.netpoint.com",
"thread": "main",
"timestamp": 1366404509322,
"fieldData": {
"level": "ERROR",
"mdc": {},
"file": "TestLogger.java",
"exception": {
"exception_class": "java.lang.StackOverflowError",
"exception_message": "Craaaap",
"stacktrace": "java.lang.StackOverflowError: Craaaap\n\tat dk.braintrust.os.logger.TestLogger.testErrorLogger(...org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)"
},
"class": "dk.braintrust.os.logger.TestLogger",
"line_number": "19",
"method": "testErrorLogger"
},
"exceptionInformation": {
"exception_class": "java.lang.StackOverflowError",
"exception_message": "Craaaap",
"stacktrace": "java.lang.StackOverflowError: Craaaap\n\tat dk.braintrust.os.logger.TestLogger.testErrorLogger(TestL.. java:390)\n\tat org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)"
}
}

The layout is based somewhat on the work done at https://github.com/lusis/log4j-jsonevent-layout, so it is easially portable to and from logstash.

Views and GUI?

So now that the logs are stored in couchbase what to do with it? From now on when it is persisted in couchbase we can make views based upon what we want explore in the logs.

We can use these views to make the logs easily searchable in a webgui etc.
But because I wanted the appender to be a small and understandable project, I find the views and the GUI a bit out of scope.

I have therefore made a new little pet-project, called couchpotato. This project will put some views on the documents and do some GUI on top - probably based on some angularJS or knockout. Another thing the couchpotato project might add is the possibility to full-text search in the JSON document, no matter what structure it have. That is going to be a neat feature and is easily done though couchbase using the elastic search plugin. I will deal with this project in a later blogpost.

TODOs

So what does my log appender missing and what features would be nice to have?

I've developed this appender specifically for my current customer, so it is for Log4J version 1.2.x only.
It could also be nice to develop it for 2.x standard of Log4J and maybe do a LogBack implementation, which is gaining a lot of traction at the moment. Logback natively implements the SLF4J API. This means that if you are using logback, you are actually using the SLF4J API.

Otherwise these are on my TODO list:

Comply better to the Logstash datamodel.
Maybe store the log object into a temp memtable to be able to continuos pump log data fast.
Clean up code.
Make the log appender asynchronous.
Do a massive load test, to see if it holds water.
Strategy for the log, if the log appender gets disconnected from the cluster. Should it put it into a queue, write it locally to disk or?
Develop an appender for the Log4J 2.x standard, maybe in the same projet.
Put the project into the maven central releases (right now its residing in snapshots)
Make prototype of GUI for testing out the maps and views - I've started a template project: https://github.com/zooldk/couchpotato

So what will be next on the roadmap for this little project? I think load testing the framework and see if it scales will be the next priority and also to build my couchpotato project. Build views and using it in a GUI makes it better to understand what is missing and what flaws it have.

BTW: If you are interested in logging and couchbase, you should really read Michael Nitschinger's blogpost about setting up log4j for the Couchbase client. The blog is about the internal logging of the couchbase java client and how you set it up and tweak it.

Links

Well thats it for now, Cheers!

/Steffen
Twitter: zooldk

Sunday, May 19, 2013

Confused about Google and their XMPP advocacy

At Google I/O 2013 (15-16'th of May), Google announced that the new Hangouts are not based on XMPP as a messaging backend.

The story was brought shortly after this announcement on theverge (http://www.theverge.com/2013/5/15/4318830/inside-hangouts-googles-big-fix-for-its-messaging-mess). At first I was a bit shocked, so I tried to dig a bit further down to what it means and what had happened.

Apparently it seems that Google did it to be able to control the protocol them self and to make all of their product interoperable and not the other way around???
Their claim of ditching XMPP for Hangouts, was also because they found their messaging landscape too fragmented and too messy. They wanted to be in front of the development and have bigger market shares compared to IMessage (Messages), WhatsApp, Facebook Messagener etc. Ironically many of these services run on XMPP services (even though not federated one, which google did).

Google also claims that they tried to use XMPP to open up their infrastructure and federate with other XMPP implementation vendors, but that the other entities seems to be reluctant to do the other way around..
I would say that this deeply is a BIG claim - most vendors and big entities I know of, participates in the making of the core protocol and the XEP (extensions) - I haven't seen Google in a while involving them self in these discussions in the XSF (XMPP Standard Foundation) and the makings of the standards. We have an excellent XMPP community and a really open and friendly XSF, with a lot of meetings and discussions.

What I have seen of participation from Google, is twist in some of the extensions in their own little fuzzy way and not giving the changes back to the standards - which mean they claimed to use the standards, but where really not 100% compliant in some of the cases. Some things they added "secretly" to the protocol was good and some were bad.

It's an open standard protocol, so participate Google! And contribute not only to open source, but also to open standards which are becoming more and more important.

Back to the current issue. So what does it mean to the end users and other services? From a reliable source, Google is not making S2S support and limited client to server C2S, which means supporting by only doing ordinary chat and not group chat, file transfers etc. This basically means no federation from/to Google hangouts, which is a bummer!

For the end-users the coming years will bring an even more split market, confusion and walled gardens!
Google have drawn a line in the sand and you can either be on their side or on some other silo...

This will split many of the users. Some will be on Facebook, some on Google, some on whatsapps and what not. Each of their users will be locked in, into their according silos..

So what will be next, Google? SMTP/IMAP, HTTP or other open standard protocols?
As an example; if google e.g. closed the IMAP access because they wanted me to use their gmail UI, I would be frustrated!. I have a lot of different mail accounts and I want my mail messages to be gathered in my own native client with IMAP. Even worse if the closed down SMTP I would not be able to route my mail around to other user outside of google - which is quite a good analogue to XMPP federation issue we got here!

So what can/shall we do about it? Well its not easy to come with a simple solution. For example what do we do with a "couple" of millions users coming from google chat? Do we get them on one single XMPP domain, e.g. jabber.org, FSF (Free Software Foundation) ? or do we encourage them to put up their own federated XMPP service at home?

The answer is not easy - I definitely think that we need some kind of bigger XMPP service provider(s) where all starters of XMPP can go and just fetch a new account. Setup a group chat (MUC), and do what ever they want. Not all newcomers knows how to setup their own domain and XMPP service - even though it is quite easy (I'll write a blog about that later!).
But setting up this service also comes with a price! Maintaining and keeping an XMPP server operational at a large scale for a long period of time is NOT easy! - you have to deal with spammers, black hats, 99.9999% uptime, strange client behaviours and patterns etc.
Even though we can blame Google for a lot of stuff, they still maintain(ed) the largest XMPP service on the Internet, and they do/did a good job for uptime and maintaining it. Even though I've had some problems with subscriptions and black listing form google - breaking the XMPP core protocol and specifications, but thats another issue. :-)

After reading a blogpost from Mickael (@mickael) : http://blog.process-one.net/google-cloud-messaging-update-boosted-by-xmpp/, I found some thing more funny and confusing.

Google announced their new backend for Google Cloud Messaging (GCM) which is based on XMPP..

So Googles claim about streamlining their infrastructure and interoperability really is confusing!.
Reading up further on GCM (see http://developer.android.com/google/gcm/ccs.html), it sure looks like that it uses XMPP for persistent connections towards the client - sending JSON as a payload inside a message stanza.

"The GCM Cloud Connection Server (CCS) allows third party servers to communicate with Android devices by establishing a persistent TCP connection with Google servers using the XMPP protocol."

So besides hangout (which does not use XMPP anymore), their messaging backend infrastructure for Android, Chrome etc. is using XMPP for server push.
Hmmm confused about Google and their XMPP advocacy?.. I sure am!

Because this blogpost is written on blogger.com (google domain) I sure hope that you dear viewer, are able to view this post on HTTP and not only SPDY..