at Canonical
Brad Marshall
Who am I?
- Unix sysadmin with 18 years experience
- Currently sysadmin at Canonical
- Worked in environments from small software startups to university
- Volunteered for Debian, SAGE-AU,, HUMBUG, CQLUG
- Presentations/articles at SAGE-AU, AUUG, CQLUG, HUMBUG,
Where Is It Used
- Canonistack - dev / testing stack
- Prodstack - production stack
- Stagingstack - staging environments on prodstack
- Prodstack 4.0 - Next version of production stack
- Scalingstack - WIP stack for scaling type environments (builders etc)
- Openstack Integration Lab - testing integration of hardware and software
What's Canonistack?
- Openstack installation anyone in the company can use
- 2 regions, running previous two releases
- Right now we have grizzly and havana
- Allows oversubscribing to get the most usage
- Allows dev to test across multiple releases and prototype things
- Downtime can happen, but improving over time
- Want developers to prefer this over envs, like AWS
- Good learning environment
Canonistack Sizing
- 10 Compute nodes across 2 regions each with:
- 96G ram
- 24 CPUs
- Between 500G and 2Tb storage
- 3 Swift nodes each with:
- 6G ram
- 8 CPUs
- 3 x 128G storage
- Production deployment of Openstack
- Stagingstack is the staging version of prodstack
- Currently running internally developed services
- Running 12.04 with Folsom currently
- Using Ubuntu Cloud Archive which backports versions of Openstack to LTS
What runs in Prodstack
- Certification website
- Product search - Amazon search on desktop
- Video search
- Music streaming
- Parts of website
- Juju charms website
- Juju GUI website
- Summit website
- Cassandra backends for
- Various private git and gerrit services
- Ubuntu Developer API docs
- Many others
Prodstack Sizing
- 20 Compute nodes each with:
- 96G memory
- 1Tb storage
- 24 CPUs
- 5 Swift storage nodes each with:
- 16G memory
- 3 x 1Tb storage
- 16 CPUs
- 6 Ceph nodes each with:
- 16G memory
- 11 spindles with ~20Tb of space
- 16 CPUs
Prodstack 4.0
- Next version of Prodstack in progress
- Will be Havana on Precise using Ubuntu Cloud Archive
- HA on Neutron and firewalls to start
- Once services migrated from Prodstack will integrate hardware
Prodstack 4.0 initial sizing
- 7 x Compute nodes each with:
- 96G ram
- 24 CPUs
- 1Tb SAS storage
- 3 x Swift nodes each with:
- 16G ram
- 16 CPUs
- 11 x 3Tb SATA storage
- 3 x Ceph nodes each with:
- 16G ram
- 16 CPUs
- 11 x 2TB SAS storage
Software Stack
- Openstack
- Ubuntu
- MaaS
- Juju
- Landscape
How to deploy
- MaaS and Juju to deploy Openstack to bare metal
- Juju to deploy to Openstack
- Landscape to manage servers
Service Orchestration vs Configuration Management
- Configuration management is more like universal remote that allows you to directly control TV, DVD, set top box etc - eg Puppet, Chef
- Service orchestration is more like the remotes that allow you to say what you want to do, ie "Watch a DVD" and it handles it for you - eg Juju, Heat
- Service orchestration for the cloud
- Can deploy to Openstack, MaaS, AWS, HP Cloud, Azure, Joyent (soon), local containers
- Providers being added all the time
- Charms are pre-setup, configurable services
- Juju expresses relationships between charms
- Bundle is a set of services expressed as charms
- You can control where things are deployed via juju deploy --to instance
- Juju manual provisioning means you can deploy to hosts that are pre-installed via ssh - experimental at this stage
Juju example
$ juju bootstrap
$ juju deploy juju-gui --to 0
$ juju deploy -n 3 --config ceph.yaml ceph --constraints "mem=2G"
$ juju deploy -n 3 --config ceph.yaml ceph-osd --constraints "mem=2G"
$ juju add-relation ceph ceph-osd
Juju GUI
$ juju bootstrap
$ juju deploy juju-gui
- GUI that allows control over Juju
- Can add new services, scale out, configure
- Drag and drop services onto canvas, add relationships
Maas - Metal As A Service
- Controls deployment of physical hosts
- Lets you treating bare metal the same as a cloud guest
- Actively developed, designed for hyperscale
- Juju provider can use MaaS natively
- Has a RESTful API
- Devs currently working on improving the user experience
- Virtual MaaS allows simulation of MaaS using virtual machines
Mojo - CI for Services
- Continous integration for Juju based services
- Uses jenkins to deploy and test services
- Used to ensure services are still deployable after changes
- Uses nagios checks defined for services as well as anything else
- Can test upgrades of charms or anything else you can script
- Looking at using it for deployment as well
- Still a work in progress, but already helping find issues
Workflow for apps
- Devs work up a Juju charm and how to deploy on Canonistack
- Webops (group who deal with devs) take charm and work it into a usable deployment story on stagingstack, going back and forth with devs as required
- Once everyone is happy the stack is deployed into prodstack, and into production by ops
- Updates to apps are done via devs, rolled into stagingstack and then onto prodstack by ops
- This allows good involvement between ops and devs, while still keeping seperation of responsibilities
- Needs more hardware to move to cloud initially to seed the cloud - can't just magically replace everything overnight and pull the hardware in
- Devs might think they'll be able to deploy anything out immediately - still need testing in production etc
- Devs getting up to speed with how to write apps to use cloud
- Ops have concerns around visibility of usage etc - tooling isn't quite there in some cases
- Bandwidth limitations - looking at bonded nics for next version of prodstack
Problems cont
- Picking the right hardware for cloud compute nodes - need more iops since you can be bringing up multiple guests with competing resource requirements, and cause local disk IO storms
- Making sure instances deployed in the cloud aren't running on same physical hosts
- Want HA + 1 (be able to remove a host to upgrade and still keep HA)
- Need cross environment deployments in Juju
- Colocating services in Juju is possible now - tricky though, since charms usually assume they're the only thing
- Vastly improved our density of services on servers, decreasing capex (buying less servers) and opex (powering less servers)
- Dynamic scale up and down of services - ie, we can add more website units at release time
- True reproducability of deployments
- Staging environment that closely matches prod without too much extra hardware required
- Both devs and ops develop an awareness of each others needs
- IS is involved in development side of things, help shape thing to work in production
- We are upstream for the OS - opportunity to get things in
- Dogfooding our product and providing feedback to devs
Contact me at