Monday, July 21, 2014

Gnocchi Research: Week 10

At the end of Week 9, I found that the Mongoose library (in C) was too difficult for me to work with. As a result, I searched for alternatives. Python-as a language-seemed like a good choice. This was because it had high level operations. Furthermore, Python had better documented APIs than C.

Olga suggested I run with Flask or Cherrypy. Both seemed quite similar and fulfilled the same purpose- providing the API for a web server.

Choosing the Flask API, I started implementing several server-side operations at a functional level. I used the server's file system to do storage.

  • PUBLISHER_REGISTER and SUBSCRIBER REGISTER
    • A POST that registers the publisher/subscriber by storing their public key. 
  • CREDENTIAL_QUERY
    • A GET operation that tells a publisher/subscriber information about a target entity. 
  • ADD_MEMBER
    • A POST that allows publishers to add subscribers. 
  • JOIN_GROUP
    • A POST that lets subscribers request to join a group
  • RETRIEVE_MEMBERSHIP_REQUESTS
    • Publishers GET membership requests from the previous JOIN_GROUP for a particular group
  • SUBSCRIBER_UNREGISTER
    • Deletes a subscriber from all of its groups, freeing up space in the server
  • DELETE_MEMBER
    • Publisher POST that removes a member from a group
    • New encrypted group keys must be uploaded eventually via UPDATE_GROUP_KEY
  • GET_GROUP_KEY
    • Subscriber GET request that retrieves his/her encrypted group key for a particular group
  • RETRIEVE_MEMBERSHIP
    • Publisher GET command that retrieves the members of a group he/she created
  • UPDATE_GROUP_KEY
    • Follow-up to DELETE_MEMBER
    • Publisher POST that uploads new encrypted group keys to everyone who wasn't deleted
  • CREATE_GROUP
    • A GET request by publishers to get a unique Group ID for the keyserver.
    • Also registers that group for the publisher
  • SUBSCRIBER_GROUP_LIST
    • GET request by subscribers that shows what groups he/she is a part of
  • PUBLISHER_GROUP_LIST
    • GET request by subscribers that shows what groups a certain publisher is in charge of
All of these are done on some level aside from UPDATE_GROUP_KEY: That function probably needs some thought first. Perhaps we should use a special file structure that contains users and encrypted keys, which could be parsed by the file server

Additionally, verification will have to be implemented: We don't want the key server to allow a non-creator publisher to delete members of a group :)


Wednesday, July 9, 2014

Gnocchi Research: Week 9

This week started with looking for a protocol or framework that could make implementing the key server simple. We wanted something that supported HTTP because the Chrome extension spoke the language of browsers, HTTP. We had a couple of options:
  1. Open Source HTTP web servers in C/C++
  2. Python's resources for HTTP web servers
  3. Less academically used frameworks with good APIs such as Nodejs 
Pros and Cons
In C/C++, I found two seemingly well-maintained libraries: Mongoose (C) and Pion Network Library (C++). One desired characteristic of the key server is that it is lightweight. At the very least, the Pion library seemed to take much longer to compile and had many more files involved. Mongoose, however, was just a pair of .h and .c files. Pion, using boost and possibly other C++ features, may have performance advantages over Mongoose, but for simplicity sake, I chose Mongoose over Pion.
(Additionally, Pion seems incredibly difficult to compile. On Ubuntu 12.04, I got furthest with compilation by installing boost 1.55, recompiling openssl with ./config shared, modifying makefiles to include -fPIC, and moving libssl.a to the directory Pion looked in. Even after copying all the boost files to where the old v1.48 ones were, there were still linking issues with boost when trying to compile a sample program)

Libraries in Python might be more developed than possible web server libraries in C/C++. For now, I chose Mongoose (C) because I didn't want to be slowed down by my lack of Python knowledge. Perhaps I will go back and use Python instead.

Something like Nodejs or Ruby on Rails seemed useful for developing a key server. However, we wouldn't have much control over the way certain protocols are implemented, and there could be hidden security issues or bugs. Working in C/C++/Python gives us control over implementation but opens doors up for our own bugs. Additionally, people in academia might frown upon the use of these newer, more "hip" technologies over languages everyone knows.

To be continued....


Thursday, July 3, 2014

Gnocchi Research: Week 8

This week I started to work on implementing the key server.

The Key Server must be able to do the following:
1. Register a client or publisher
2. Allow users to upload/download keys
3. Allow publishers to manage groups
4. Allow users to query the database

By "keys," the key server should serve encrypted group keys to users and public keys of clients to publishers. The primary purpose is to serve encrypted group keys to users so that users can decrypt and find the group key with their private key.

At first, I thought this could be done in a standalone web application separate from the publisher side and the client side. On the development side, I think it would have been simple. However, for end users, they would have to interact with another program (the web application that interacts with the key server), which could easily confuse people. There might have been operations that would not be intuitive to the average person.

We concluded that the key server, at least on the publisher side, should work in conjunction with the publisher app. Right now, the publisher app is a command line utility. When publishers want to upload their group information and keys to the key server, they can use the upload utility on the publisher app, and then data would be sent to the key server, which would process it.

I know a little bit about how server programs can send and receive data. A program can make a connection to another entity on the web through a port, and then once the connection is made and bound, the web server can listen and receive data through a sent byte stream. The web server can also send data back to the client through a byte stream as well, who would receive data through the same connection.

For the client side, we don't know if work should be done cooperatively. It might be easier for users to go to a website and receive their encrypted keys through a website application. Perhaps users will use the website, which will show their keys in plaintext, or perhaps they can download a file containing their encrypted key.

For security, we have not yet put much thought in. We are trying to be flexible so that changes can be made easily. As of now, we are thinking of using Kerberos to secure authenticity between users and the key server. Perhaps SSL could be used instead. Even though part of Gnocchi is to show that SSL could be weak, there has to be some first level of trust.

One design alternative thought up was to use a web development framework to implement the key server. In particular, I wanted to try this in Ruby on Rails. It's hip, new, and reportedly easy to get stuff started. I'd be learning something pretty interesting. However, my intuition tells me academia probably wants to use technologies that are well known. As a result, I'm working on writing the key server in C++ in a manner that makes it easy for the publisher app to interact with it.

Another design alternative was to work with HTTP GETs and POSTs for sending and receiving content (as in, keys). This may actually have to be done, as GETs and POSTs are the language of internet browsers, and we do have a Chrome extension for the client app.

Gnocchi Research: Weeks 6-7

I missed week 6's blog post, so this post will comprise both weeks 6 and 7.

Contrary to what I said in a previous post, OpenSSL's verify application actually does verify signatures. It turns out the proper way to run the verify command is:
openssl verify -CAfile <file with only trusted certs but can be multiple> <singular cert to verify>
As a result, running
openssl verify -CAfile adtrust_and_incom.crt fake.crt
where fake.crt was a certificate with modified entries, the verify command failed. The reason for this is simple. Openssl's verify utility only cares about the first entry of the file it's checking. So for the <singular cert to verify> spot, I was concatenating 3 certs of a certificate chain (adtrust-incommon-fake) with adtrust.crt on top, making openssl's verify utility only check the validity of adtrust.crt (which would always return true because the optarg of -CAfile would be adtrust.crt). As expected, the error message generated was error 7: invalid signature.

As for other vulnerabilities to black-box test, there could be problems with how CA signing interprets the data read from the csr (certificate request). Are there bad/missed bounds checking, or does OpenSSL properly read in various data entries? Even though we know CA request generation doesn't output files with super long entries (over 1024 bytes), what if requests were modified to include those super long entries- perhaps buffers could be overflowed.


Tuesday, June 17, 2014

Week 5: Testing Gnocchi!

I am currently not working on OpenSSL. Looking at old code was very tiring and hard to stay focused on, and I felt that I could be learning more by putting effort towards the Gnocchi side of things.

So with that, I figured the best thing I could do is to write tests for the publisher side of Gnocchi! While we don't have a spec, I can still get experience writing good tests. Honeyman once said that one thing that distinguishes UofM grads in the field is their ability to write good tests, so hopefully I can one day live up to that reputation.

I am using Google's C++ Testing Framework to make a test suite for Gnocchi.

Despite being ran from the command line, the output is neat and organized.

Google's C++ Testing Framework in action
Each one of these tests results in OK if the TEST() function returns without error and if each of the EXPECT*() functions result in true. For example, I can have a test as follows:
Test(basicTest, zero){
    int zero = 0;
    EXPECT_EQ(0, zero);
}
This test, if run, will return OK.

The testing framework also allows for EXPECT_THROW(). Basically, if the tested program throws a signal, we can catch it (if we are expecting a throw). If so, we are still OK.

Gnocchi, in its current iteration, is using Byte Vectors (which I think is fine, especially considering we don't really have a specification yet) to store data in memory. What this means is that for me to make unit tests on individual parts of Gnocchi, I have to EXPECT equality of vectors, which currently isn't supported by Google's code. Fortunately, I found a small extension to the testing framework that allows for EXPECT calls on containers that can be iterated (meaning vectors, of course).

The testing framework allows options such as --gtest_repeat=100 and --gtest_break_on_failure, which allow the user to customize his/her testing experience. For me, these options helped me find a bug in the wind() function in Gnocchi, which was removed in the next patch.

As Gnocchi continues to be modified, the tests created will have to change. That's fine for me, however, because I think I'm learning more about writing good tests that can be easily modified :). Unfortunately, I am still only writing unit tests. Probably this week, I will make higher-level tests that can also test to make sure we have expected behavior on the directory level. Further down the road, perhaps I can benchmark Gnocchi's sign/encrypt/decrypt/etc functions.


Misc:
I noticed Gnocchi was taking an abnormally long time to compile on my system, and I wanted to know why. My laptop's processor isn't the fastest (~2.66ghz Arrandale), but I even tried compiling on a desktop Haswell i7, which didn't result in much of a speedup. Part of the problem could be that I'm developing in a virtual machine (this is actually preferred for me because I have the best keyboard/mouse input that way). I give it 3gb memory (which should be enough?), 3 hyperthreaded cores, and it's stored somewhere on an SSD. But perhaps read/writes are not very fast on virtual machines. Perhaps compiling is more memory intensive than 3gb. Ultimately, however, there are (right now) many dependencies in each part of Gnocchi, which I learned contributes towards long compiling times; hopefully we can cut down on that and reduce compiling times in the future.

Monday, June 9, 2014

Week 4: OpenSSL part 3

In order to test potential exploits I thought of, I setup a private web server using an old computer box. It was my first time doing so, and I learned a bit about routers, networking, ports... (which unfortunately is not the topic of this blog post).

OpenSSL's verify utility can be used to verify certificate chains. In order to do this, one would have to concatenate the chain of certificates into one file, from root-level first. For example, umich was signed by InCommon, which was signed by AdTrust, a root-level Certificate Authority. To have OpenSSL verify it, one would use the commands:
cat adtrust.crt incom.crt umich.crt > combinedCerts.crt
openssl verify -CAfile adtrust.crt combinedCerts.crt
Here, -CAfile specifies a trusted root certificate.

I tried to step through the process of making a fake certificate.
openssl genrsa -out server.key 2048
openssl req -new -key server.key -out server.csr
openssl x509 -req -in server.csr -CA incom.crt -CAkey server.key -CAcreateserial -out fake.crt -days 500
Unfortunately, the 3rd command failed because the "private key didn't match the certificate," when the error in OpenSSL was caused by the mismatch in public keys. (Now that I think about it, I was probably misinterpreting private key to mean the (m,d) pair rather than the server.key file.)  To have the public keys match, I modified the first 6 base-64 lines of server.key to contain the public key of the certificate I was signing with (incom.crt). This was called fakeIC.crt. The command,
openssl x509 -req -in server.csr -CA incom.crt -CAkey fakeIC.key -CAcreateserial -out fake.crt -days 500
was ran, which succeeded without error. Verification with OpenSSL in the manner described above succeeded as well! Apparently OpenSSL's verify utility only checks to see that certificates... have the names of their signers on them? I put the certificate chain (adtrust-inCommon-fake.crt) onto the ssl portion of my web server and tried accessing the https port via Chrome. There was an error- fake.crt had an invalid digital signature.

My thoughts at this point were: Perhaps it's because I didn't use x509v3 extensions, which the umich certificate had. Or perhaps I could fake-sign with umich's certificate rather than incom's certificate.

I found out that signing certificates with v3 extensions requred a v3extensions file, which had the data to be placed in as v3 extensions. I made such a file, v3.ext, and wrote it with the same information as the v3 extension information on umich's certificate. Signing with v3 extensions, as well as signing with umich's certificate, both failed the chrome test.

At this point I figured it was because of the actual "signature" portion of the certificate being wrong (silly me). Apparently actual SSL checks the signature, unlike OpenSSL's verify.

I also found that batch requests for OpenSSL's req utility don't actually make/sign multiple certificate requests. Rather, they omit the part of openssl req where the user inputs their certificate subject/credentials via standard input. Optionally, the subject/credentials can be taken from a file. So for someone making many requests using similar subject/credentials, they might use the batch utility as well as a script, but ultimately OpenSSL does not handle requests in bulk. This was further verified by UofM ITS, who said they don't handle multiple certificate requests at the same time either.

Monday, June 2, 2014

Week 3: Using OpenSSL...

...to create signed certificates without the private key!

While dying of confusion from looking at OpenSSL code, I stumbled upon something interesting. The function that checks the private keys of the CA Certificate file and the CA key file actually just checks the public keys. Because of this, it should be possible to use OpenSSL's utilities to create a signed certificate using just the CA Certificate PEM file: Modify the base64 encoding of the CA Certificate PEM file to use the same public key as any random CA key file. (Alternatively, one could modify their CA key file to have the same public key as the CA Certificate PEM file.) This seems like it would work with the UM Web CA Certificate. However, I don't think something like this would work with something root-certified such as the InCommon CA.

For more clarity, on a standard self-signing process, several files are involved:

  • CA/Root Private Key file - rootCA.key
  • CA/Root Certificate file - rootCA.pem
  • Certificate Serial number file - rootCA.srl 
  • Certificate Request file - newcsr.pem
  • Signed Certificate file - newcert.pem
Private keys and root certificates have matching public keys in their PEM encoding, so programs like OpenSSL can verify that the certificate file's modulus and exponent match the private key's. Then, someone (possibly yourself) generates a certificate request and saves it into a file. Various details go into this, such as  Country Name, Locality Name, Organization Name, Common Name, and email. I looked into the code that corresponded to the generation of certificate requests, and luckily, one cannot use a buffer overflow attack to wreck havoc on any online server that creates certificates using openSSL's latest version. Finally, the PEM formats for the CA Certificate file match the format for the Signed Certificate file. 

The PEM format, is, alas, a bit complex. For RSA key files, you have some header information followed by the modulus, public exponent, private exponent, p, and q. This is simple compared to the PEM formats for Certificate Request and Root/Signed Certificate files. They have headers (whose meaning has to be looked up on old documentation websites), followed by all the subject info (Country Name, State Name, etc.) interlaced with even more header information. At some point, the PEM file format describes the encryption info (like 1024 bit RSA Encryption), followed by the modulus, exponent, certificate attributes, and signature algorithm (like sha1WithRSAEncryption). 

Converting to and from base64 to hex takes a bit of effort as well. What we have is actually several lines of 64-character "words." Each line uniquely translates to a 384-bit word. But you might have part of a modulus beginning on the middle of a line and ending in the middle of another line. In order to modify this, special precautions have to be kept. Conversions from hex to base64 don't make sense unless it's a 384-bit (or 96 hex characters, where 0x41 is 2 hex characters) word. Because of this, one would have to find every affected line by a change, modify their hex values, and then convert each line individually from hex to base64. 

OpenSSL is notoriously difficult to read through. There's very little documentation, and function calls literally go everywhere- they jump around different source files in different folders. The certificate signing is done in the apps/x509.c file, whose simple sign function actually has its tree of function calls going to perhaps at least 25 other *.c files, such as crypto/x509/x509_vfy.c, and referencing files like /include/openssl/x509.h. 

I'm also trying to envision the Gnocchi top level structure while I'm at it. I really don't like sifting through OpenSSL code, which I chose to do because I wasn't familiar with the server/javascript involved with the Gnocchi development side. We still have to figure out a good key revocation scheme, meaning our idea of having 4 key portions-- Publisher, High-Bandwidth Content Server, Client, and Low-Bandwidth Key Server-- is not set in stone.

I talked to the guys at ITS, and it looks like we're getting a chance to see how their certificate approval works!

Tuesday, May 27, 2014

Week 2: Investigating OpenSSL

For Week 2, I mainly looked at OpenSSL code. It is quite burdensome. The code isn't very well documented. I've generated my own self-signed certificates with OpenSSL, which wasn't too complicated. Five files in total were generated: a root private key file, a root certificate file, another private key file, a certificate request file, and a (self-signed) certificate file. Each of these files had encodings in base64. Luckily, base64 to hex isn't something vastly complicated, so I was able to figure out what the different hex strings in the base64 encodings all meant. Base64 conveniently shrinks the size of these files and keeps them in a format that can easily be, say, emailed. Oh, and I forgot to mention these files have the extension .pem, which stands for Privacy Enhanced Email. Cool.

I don't want to say too much, though... (It's not like we have much to say yet anyways.)

On another note, Facebook apparently resizes photos when you upload them to their servers, so you can't download a, say, 4MB photo that you uploaded earlier to Facebook; it'll be 1MB or smaller. What this means is that you can't use steganography on photos you upload to Facebook. Sad.
Edit: Actually, there's more. I tried uploading a small photo with some encoding (even with the "high quality" option), but Facebook ultimately still cut down on bytes, ruining the steganographic file.

Monday, May 19, 2014

Week 1: Introductions

On Day 1 I read papers on SFSRO and tried to learn about Merkel trees. We had a meeting discussing administrative things as well as what the Gnocchi/noSSL project would be about. It can be summarized in one phrase: "Protect content, not connections." Days 2-5 were mostly the same thing: Reading research papers to learn what was going on. I reread Fu's paper on SFSRO to get as strong of an understanding as possible and thought about how it could be implemented in a Chrome extension client. Since I knew no Javascript, I had to learn a bit of the language (as well as how to write Chrome extensions) so that I could understand what its capabilities were and contribute towards the development of the client. That was sorta fun- Javascript seems to be syntactically a bit similar to C++. Looking into GSS, I figured out that at best it's only another crypto API that we could use, but the basic underlying idea is pretty good: Seal, Encrypt, Send; Receive, Decrypt, Verify. I also delved into the Plutus file system which is sort of similar to SFSRO. Finally, I looked into openSSL code for a day. It is NASTY. The unique thing about SFSRO is that it is pretty good for supporting read/append only. I'm still trying to imagine how that would work in real-life examples such as social media, though. Unfortunately, nothing exciting yet has happened. We still have to come up with a specification for the project....