Friday, 18 May 2018

.Net on Elastic Beanstalk (AWS) with VPC

I have set about considering whether AWS is a decent alternative to Azure for cloud stuff. Azure is obviously more Windows and .net friendly but its virtual networking is pretty poor currently and I know that AWS has fairly decent VPC so I wondered if I could copy the basic infrastructure used on Azure for PixelPin but on AWS!

I started with an article here, although it is for RDS (relational data storage) and I wanted to use DynamoDB NoSQL so I decided to work through it and find out how to make it work.

There are some steps that are assumed before you can follow these instructions so we need to do these first and then mostly follow the linked article (the steps copied below):

First things

1) Allocate yourself an elastic IP address in the VPC panel -> Elastic IPs It should be of type VPC
2) Create a key pair for access to the virtual machines under EC2 panel -> Key Pairs saving the pem somewhere safe.
3) In IAM, create a ROLE for ElasticBeanstalk. You can choose the service from the list and accept the two beanstalk Policies. Give it a suitable name.
4) In IAM, create another ROLE for EC2. Select the policy AWSElasticBeanstalkService. Creating the EC2 role automatically creates an instance policy for later.

Create VPC

3) Go into the VPC panel, click VPC Dashboard on the left and press Start VPC Wizard
NOTE: The following notes are for a single private subnet but if you want high availability, you should choose 3 subnets, one for each availability zone in a region. This will allow the database nodes to be resilient to a single zone failure. The web servers will all need to be in the same zone as the load balancer however.
4) Select the template VPC with Public and Private Subnets on the left and press Select. This is a good default template that allows public entities like the Load balancer to be accessible from the internet but private servers like web instances to be hidden.
5) In the next page, give your VPC a useful name, the default IP allocations should be fine but ensure that the availability zone for the public and private subnet is the same by selecting one entry from the list.
6) In the Elastic IP Allocation ID field, select the elastic IP address you created earlier. If you forgot, open another browser tab and create the address in either EC2 or VPC panels then come back to this tab and click into the box again to refresh the list. If you can't see the elastic IP, then you created it to be classic instead of VPC, deallocate and allocate another one.
7) Leave the rest of the defaults and click Create VPC. Creation will take a few minutes

Security Groups

8) Create a new security group in the EC2 panel -> Security groups (you can reuse an existing one, but having one per system gives you more flexibility). Open port 80 from all ip addresses (0.0.0.0/0) Select the previously created VPC for this new group. Create a second group for the database cluster with two rules permitting access for CIDR 10.0.1.0/24 and 10.0.0.0/24 to port 8111

DynamoDB Cluster

9) Create a new DynamoDB Cluster by going into the DynamoDB panel and selecting Clusters (for an existing table) from the left hand side. If you haven't created your table yet, you will need to create it. We don't have space to discuss the details of noSQL partitioning but if you are just testing for now, create a table called users and set the partition key to be email of type string, leave the rest of the defaults and press Create. Next click Clusters on the left hand side.
10) Press Create Cluster and give it a name. Choose a node type, which can be changed later. If testing, there is a small node size.
11) For the IAM role, you either need to select one you have already created, or you can create one here with the relevant permissions. Since my test if for read/write, I have chosen this but you might be clustering a table that is only for reading.
12) Under SubnetGroup, choose Create new, give it a name and description (you could call it TestGroup or something like that) and select the VPC you created earlier in the list and select both subnets to cluster against. This will allow both of these subnets to access the database cluster but you could restrict this if you need to - this actually allows DynamoDB to allocate routeable IP addresses to the cluster and nodes in the correct CIDR for your VPC subnets.
13) Select the DAX security group you created earlier in the list and click Launch Cluster. This will also take a few minutes to complete.

Elastic Beanstalk

This is like Azure App Services and provides a more containerised (and hopefully faster to scale) service for scalable web applications.
14) Open the Elastic Beanstalk Panel and click Get Started
15) Choose a name and a platform, I am using .Net (Windows/IIS) and a Sample Application. Press Configure More Options
16) Change the Configuration Preset if needed.
17) Press Modify under Capacity and switch it from single instance to Load Balanced. Choose your min and max instances values (use min of 2 to understand load balancing). You can leave the scaling metric for now and press Save.
18) Press Modify under the Network box and choose your VPC name. Select the box next to the public subnet for the Load Balancer and the boxes next to the private subnet(s) for the instances. press Save.
19) Press Modify under Instances and choose what you need. I am choosing t2.medium with 30GB disk for the Windows machines. Also, choose the security group you created earlier for the web servers. This will not show until you have selected the correct VPC under the network configuration.
20) Select Modify under Security, choose the Elastic beanstalk role you created earlier, the key pair you created earlier, and the EC2 role you created earlier in the IAM Instance Profile box.
21) Set an email under notifications.
22) Press Create App and wait for the system to be configured.
23) Test the newly created endpoint, shown after the system starts up to ensure it routes correctly and you get the sample app.

Now all that's left is to upload a test app to ensure my instances can see the DynamoDB Databases and also perhaps add a Redis to the private subnet as well.

Virtual Networks on Azure? Not quite ready for the big time yet!

Networks are one of the most simple and robust ways of achieving security in a distributed system. A web application can talk to a database on another server and using networks, I can restrict access to the database only to the web server and perhaps my desktop machine, I can also setup routes generally which mean that no other machine could even route to the database server, even if they were on the same physical network.

On the cloud of course, we don't have loads of techies running around and plugging things into different sockets for us, we achieve the same effect in software - a virtual network. For the most part, these serve the same purpose: configure IP addresses; setup routes; configure gateways and firewalls etc.

In short, they are a must-have for secure companies.

BUT

If you use these on the Azure cloud, currently, you are severely limited as to what services have complete compatibility with virtual networks. Obviously the expensive services are, like app service environments, but many of us do not use these for our current workloads or cashflow and using them with app services is only partially supported. That is the same for redis, which only supports vnets on the Premium platform (> £300/month). Surprisngly, it also supports Cloud Services, a relic of the olden days, which were a very thin layer over a VM and are generally considered a poor alternative to both App Services and also Virtual Machine Sets.

I decided to see whether I could get a PixelPin deployment running on VNets so that only the app service had public access and all the backend services were only on the private network. I was going to worry about operations access later. This is what I found:


  1. Setting up virtual networks is not particularly straight-forward until you understand the various elements, which you might understand in theory but not how they relate to Azure services. For example, you can create a Virtual Network and this will also create you a default subnet of IP addresses (which you can't rename afterwards) but it doesn't create or ask about a gateway when creating it, which means it cannot be accessed from outside
  2. I found out that I needed expensive Redis platforms, in order to work with vnets (I created one anyway, just to carry on, even though I wouldn't have paid that every month)
  3. I discovered something called Service endpoints, which appear to be a half-hearted, or maybe stop-gap solution for connecting services that do NOT support vnet natively into your vnet. I needed these for both CosmosDB and also for SQL Azure and storage (storage!). These allow some of the security of the vnet i.e. you cannot connect to the service except from the subnet (good) but they do not see the vnets ip addresses, so firewalling them requires you to "allow connections from all Azure service", which is crap!
  4. I then tried to connect my App Services web app to the network. First lesson was that you needed to use the Standard tier, which is OK, except that when testing these things, it is nice to use cheap or free tiers since you don't need performance, you only want to wire it up. That was fine for production though since we already use the standard tier. I then also found out that you can only connect it to one vnet, which was probably OK in a simple case but still a shame you can't add arbitrary cards to the app services machine (even if there was a limit of, say, 3 or 4).
Basically, none of it was great. I can't remember what made me finally give up but there were too many limitations currently.

On Amazon, everything can go into a virtual network, if I understand ti correctly, but of course, they are not an MS house and do not support Windows or DotNet anywhere near as well as Azure do.

Most of these comparisons are 6 of one and half-a-dozen of the other but the danger here is that people might just decide to leave MS and head to AWS because the network segregation if more important than the extra work require to deploy dotnet to AWS so MS need to sort this out with urgency and make it all work much more consistently. Also, instead of forcing people into expensive tiers to try these things, they should instead impose other restrictions to allow people who can afford it and need the performance to pay big money and those of us testing things out not having to.

Let's hope they do it soon!

Friday, 11 May 2018

Bootstrapping Yii2 Module and developing locally

I want to build an OpenBadges module for Yii2 for a site I am building, but there were two things I struggled with. Firstly, how do you make a module and get some custom URLs setup and second, how do I build it Composer style for the future without having to publish a Dev version - that might not even work - to Packagist (or elsewhere).

Using Composer to develop locally:

  1. Create a folder somewhere for your package
  2. Make sure you have a composer.json etc. as if a real project
  3. git init to create a repo and put your files in there, commit etc.
  4. Modify your composer.json in the project you want to bring it into (e.g. my Yii2 project) and add a repository: 
  5. "repositories": [
    {
        "type": "composer",
        "url": "https://asset-packagist.org"
    },
    {
        "type": "vcs",
        "url": "../packages/lukos/yii2-openbadge",
        "options": {
            "symlink": true
        }
    }],
    
  6. The url is a relative or absolute path to the directory that contains your composer.json (not its parent directory)
  7. Run
    composer require "yourname/your-package @dev"
    in the folder of your main project. Don't get confused here, the yourname bit should match what is in your package composer.json and may or may not match the folder it is in!
  8. Running require should update your local vendor folder with the contents of your package but on Windows, it does NOT create symlinks, it will simply copy the files, which is a shame. At this point, however, you can edit them in the vendor folder and once you are happy, copy them back to the package folder and commit the changes to git.
  9. There is a "path" type for a composer repository, which shouldn't need git but I couldn't get that to work so I've used vcs instead.

How to bootstrap a Yii2 module

I wanted a URL rule that was neater than /openbadge/openbadge/badge or whatever but this requires a URL rule, which must be added in Bootstrap, which is not enabled by default in modules.
  1. make sure your Module class implements yii\base\BootstrapInterface, which defines a single method: public function bootstrap($app)
  2. Inside this method, add something like:
  3. $app->getUrlManager()->addRules([
        new GroupUrlRule([
            'prefix' => $this->pattern,
            'rules' => [
                '' => 'openbadge/'
            ],
        ])
    ], false);
  4. $this->pattern (in my case) is simply 'openbadge' but allows it to be overridden in config if required. False in the method makes sure it is the first rule! so my rule will route all requests to /openbadge/action to my (only) controller, called OpenbadgeController
  5. You then need to ensure you have a bootstrap entry in your project config so that it will run the bootstrap for the module: $config['bootstrap'][] = 'openbadge';

Friday, 4 May 2018

Make sure you sudo openssl dhparam!

This was probably the slowest thing I have ever run on Linux (Raspbian on a Rasp-pi to be exact) and to be fair, it does say, "This is going to take a long time".

I think it took something 20 minutes and then.....

Can't open dhparams.pem for writing, Permission denied!

AAAAAAAAAAAAAAAAAAAAAAHHHHHHHHHHHHHHHHHHHHHHHH

Remember Luke, it can't write it into /etc/nginx unless you use sudo!

Security is so slow, wouldn't it be nice to buy ready-made ones from the internet ;-)

Why you should stop using Paypal to process payments

Introduction

With the new GDPR regulations soon to be enforced, many of us are reflecting on our providers and those we use elsewhere. We are now partially responsible, as Data Controllers, that the services we use are also compliant with GDPR - we can no longer say, "That was our supplier who lost the data".

Firstly, most people don't seem to know that the GDPR regulations are already law! The 25th May is not a release date, it is the date that the EU set when enforcement can start. In other words, serious companies could and should already have these principles in place.

Of course, what actually happened, was that it revealed some widespread abuses of personal data and many companies putting it off until the last minute to avoid restricting what they do - after all, knowledge is power! PayPal is a case in point, a large, well-known payment provider who you would expect to treat your data with respect and be at the front of the queue to be compliant.

But NO. Almost no information anywhere about what they have done and why and a new Privacy Policy that is effective from 25th May - Yes, effective at the last possible date, even though it appears to be finished. Why? It is the first of several stinks, the only reasons would be either they think that 25th May is the start date of the regulations (we would expect highly paid lawyers at a large company would not be so stupid) or otherwise, they are currently not in the spirit of the GDPR and want to delay these new regulations as long as they can get away with.

Currently, PayPal, when processing credit card payments, even for Guest Payments, require you tick a box to agree to terms that include marketing, even though you are not dealing with PayPal, you are dealing with the shop that you want to pay. That in itself is unclear and underhanded!

But OK, PayPal are not breaking any laws until the 25th (well actually they are, but this will not be enforced until the 25th May). Let us look instead to their new Privacy Policy to see whether this looks like it is compliant with both the letter and the spirit of GDPR:

tldr; It isn't, you should stop using them!

First Impressions

One of the main principles of GDPR is that the documents should be plain, easy-to-read and obvious. However, this document has a generally readable language but with too many abstract or blanket terms that seem to hide an intention to other things. For example, the statement, "To operate the Sites and provide the Services, including to...communicate with you about your Account, the Sites, the Services, or PayPal". Communicate with me about PayPal? That sounds like, "we will communicate to you about anything we feel like" and this is under the heading "To operate the Sites and provide the Services". In other words, they are saying it is legitimate business to communicate with me about PayPal! Of course, they might mean "about PayPal being unavailable" or "about PayPal changing its name" but these need to be explicit.

The layout is otherwise OK, the headings understandable, although the contact info is a little spread out.

Why Do We Retain Personal Data

This section is quite small and contains a few iffy statements. Firstly, saying that they keep data for their legitimate business purposes is not helpful if they think that this includes marketing, selling data, advertising tracking etc. They might consider these legitimate but that doesn't mean that it is a proportional, reasonable or expected use of the data. Also, it is not helpful to write an unbounded statement like, "We may retain Personal Data for longer periods...if it is in our legitimate business interests". It is a 'nothing' sentence because everyone is obviously allowed to keep data as long as is reasonable for business interests and not prohibited. Sentences like this have a smell of someone trying to hold onto retention rights that wouldn't stack up if they were explicit. 

Of course, it might also be poor quality copy, but that would be unfortunate for a company who must pay millions for legal advice!

How Do We Process Personal Data

This is really where the rubber hits the road since it exposes the heart of most companies for data, control and power and PayPal is no exception.

The sub-headings are fine and understandable. So they process data to operate the service, basically. Great! What does this include?

I already mentioned "communicate with you about your Account, the Sites, the Services, or PayPal", which might be innocent but also sounds a bit too blanket for my liking. "create an account connection between your Account and a third-party account or platform" is another abstract statement. Obviously, if there is a link between me and, for example, my card provider, that would be reasonable and expected. If it was a link to an advertising platform or external data analysis company, it wouldn't (necessarily).

"To manage our business needs", again the problem here is that you can read things in two ways, "improving the services" could mean surveys and unwanted emails from PayPal but it could also just mean watching the system stats to find out if e.g. there is problem for users from the Far East.

They mention "enforce the terms of our Sites and Services" twice - nothing like proof-reading!

The consent part will always be generally OK because now that consent has to be explicit, it is up to the user to consent to whatever they want. As long, of course, that PayPal actually implement this properly. There are still plenty of sites that automatically opt you in to marketing!

PayPal should clarify some of these. If they are thinking of something specific, provide an example, don't try and abstract the principle so much that it is not clear what they are saying.

Do We Share Personal Data

This is now more to do with whether we trust third-parties and/or consider their use reasonable and expected. The opt-out of these needs to be more clear.

"With other members of the PayPal corporate family" Who? This could be a separate legal entity (expected) or some other random service (not expected).

"With other companies that provide services to us". Mostly OK until we get to "send you advertisements for our products..." Are we only talking about if you have opted in? If we opt-out, does this data get removed from those other companies or will it be more Data Rubble that lingers on the web?

"With other financial institutions that we have partnered with..." I don't think so. The wording of this suggests that consent is not involved, in which case this is completely unacceptable! You cannot share data unless it is reasonable, expected or consented to. If I haven't asked to get marketing, then you can't share my data with a company that is selling something.

"With other third parties...". Some very worrying statements here:

  • "If we believe, in our sole discretion, that the disclosure is necessary...". Sorry, that is completely unlawful. You cannot disclose any personal data except for legal reasons and those reasons need to involve a legal entity like the police, not some random person at PayPal.
  • "To protect the vital interests of a person". What? By giving my data away, you will protect me? Someone else? Another very smelly term that needs to be qualified or removed.
  • "To investigate violations..". Again, this needs to be lawful, you cannot disclose someone's data to a third-party purely for your convenience.
  • "To protect our property, Services and legal rights" What?
  • "To facilitate a purchase or sale of all or part of PayPal's business" Completely illegal. Even if PayPal was bought by someone else, the new owner would not automatically inherit access to the data! 
  • "To help assess and manage risk...". Not to third-parties. Maybe in the 1950s but you need a lawful reason to process data and if there is a legal concern, no-one outside PayPal should be given anyone's data without legal intervention.
  • "To companies that we plan to merge with or be acquired by". Again, not sure this automatically lawful.
  • "To support our audit, compliance and governance functions". Nope. Audit and compliance are not lawful basis for someone having access to personal data and since they are not legal requirements (generally), cannot override the GDPR regulations.

Conclusion

There are other parts that are not great but don't raise alarm bells but it is also a shame that they do not use the phrase Data Controller anywhere although they do provide contact details for their Data Protection Officer and a way to both check FAQs and contact them with any other questions.

Personally, I won't use PayPal until they start seeming like someone on the side of Privacy. They make tonnes of money from payment commissions and have no reason to do all the other stuff on the side unless they are just greedy. If the writing is simply poor and they are not doing anything nefarious, then they need to make some massive changes to the web form for payments and make the wording much more transparent.

Currently for my money, I don't trust them and will wait for some test cases against them.

Monday, 19 March 2018

Mocking SignOutAsync and SignInAsync

It seems that despite Microsoft going some great distance on Dependency Injection in DotNet Core, there is still a lot of smoke and mirrors with the horrible blob they call HttpContext.

Mocking this in unit tests is doable but a bit of a pain. The easiest way is to set the ControllerContext.HttpContext to DefaultHttpContext, which provides some magic services out of the box, allowing things like The UrlHelperFactory and things to work as expected (or at least not to crash).

However there is a problem! It only provides default services for things that work out of the box with safe defaults. If you try and call SignOutAsync and SignInAsync, your test will give you the infamous: Message: System.ArgumentNullException : Value cannot be null. Parameter name: provider and the call stack will show ServiceProviderServiceExtensions.GetRequiredService()

There is a stack overflow answer here but it uses the deprecated AuthenticationManager class which is not great.

I played around and created a mock for the auth manager:


Then I needed to add a new RequestServices object (of type IServiceProvider), which is null by default in DefaultHttpContext (but don't be fooled because it handles things automagically!).

Request Services Mock Gist

BUT then when I ran all my tests, about 50 of them now failed, including the one I was "fixing" because now I was being told that IUrlHelperFactory and ITempDataDictionaryFactory were not being resolved! What? I hadn't touched them and the RequestServices property was null by default - what was happening?

I remember reading that DefaultHttpContext provides you some basic services by default - although unfortunately, it doesn't do that very obviously! By setting RequestServices, this magic service is obviously removed and things you used to get by default, you don't any longer!

Fortunately, there are default implementations which are easy enough to use for the additional 2 services, and hopefully if you are using any others, they will also have defaults or will be easy to mock:

Additional services to mock Gist


Wednesday, 31 January 2018

Filebeat as a Webjob on App Services to send IIS logs to Logstash

I am struggling getting certain parts of my ELK stack setup but surprisingly, setting up FileBeat to forward logs from IIS in App Services worked first time!

Filebeat is a lightweight exe that can do some very basic log parsing and forwarding, either directly to ElasticSearch or more likely via Logstash, which is a much heavier weight and scalable application that can perform various parsing and modifications of messages before they go into ElasticSearch. In this case, IIS logs should be modified in LogStash to give them more useful metadata.

Logstash is too big and resource hungry to use on App Services (unless it was installed centrally somewhere but that would likely not work well) but fortunately, FileBeat measures in at about 30MB expanded and 8MB in a zip file which is easily small enough for App Services.

The steps are straight-forward and this worked for me with v6.1.1 of filebeat for Windows.


  1. Download filebeat and extract the contents to a folder somewhere to edit
  2. You can delete the install and uninstall as a service PS scripts as well as the reference yml file to save a few KB!
  3. Add a file called run.cmd which includes the command line .\filebeat.exe -e
  4. It is recommended that you test filebeat locally first to ensure your pipeline is working before you bring Azure into the mix but if you already know that works, you can skip that step.
  5. Edit filebeat.yml using the code as shown at the bottom (keep other stuff if you know you need it). The only specific bit for App Services is the log path.
  6. ZIP the contents of your extracted folder by selecting all files and folders in the directory that contains filebeat.exe and choosing Send to compressed (zipped) folder. Do NOT do this on the parent directory, otherwise the zip will include the parent directory at the top level. Call it something like webjob.zip
  7. In the Azure portal, select the App Service you want to use, choose Web Jobs and the + button to add a new one.
  8. Choose a name, select Continuous as a type and select the ZIP file you created, choose multi-instance if these needs to run on every instance of your app (it probably will need to unless you are testing) and press OK.
  9. It will run immediately so make sure that your endpoint is running correctly
Note: The path used on App Services works on my current setup but I cannot tell whether this drive letter and path is guaranteed not to change! I don't currently know how to query the logs directory using e.g. an environment variable.


#=========================== Filebeat prospectors =============================

filebeat.prospectors:

- type: log

  # Change to true to enable this prospector configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - D:\home\LogFiles\http\RawLogs\*.log

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  exclude_lines: ['^#']

#============================= Filebeat modules ===============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

#==================== Elasticsearch template setting ==========================

setup.template.settings:
  index.number_of_shards: 1
  #index.codec: best_compression
  #_source.enabled: false

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["logstash.example.com:5044"]