Wednesday, February 23, 2011

Choosing the right scheduling algorithm for a Linux Based Load Balancer

I'm currently doing some research into choosing the right scheduling algorithm for a Linux based Load balancer for a dual front end moodle installation.
The actual loadbalancer implementation will be detailed later.

This particular post will be to outline the scheduling algorithms possible on ipvs and which would be the best option for my particular scenario. The assumption in my case is that both webservers are identical, from hardware specifications to OS to php code.

From the CentOS Documentation on IPVS Scheduling Algorithms:

Round-Robin Scheduling
Distributes each request sequentially around the pool of real servers. Using this algorithm,
all the real servers are treated as equals without regard to capacity or load. This scheduling
model resembles round-robin DNS but is more granular due to the fact that it is networkconnection
based and not host-based. LVS round-robin scheduling also does not suffer the
imbalances caused by cached DNS queries.

Weighted Round-Robin Scheduling
Distributes each request sequentially around the pool of real servers but gives more jobs to
servers with greater capacity. Capacity is indicated by a user-assigned weight factor, which
is then adjusted upward or downward by dynamic load information.
Weighted round-robin scheduling is a preferred choice if there are significant differences in
the capacity of real servers in the pool. However, if the request load varies dramatically, the
more heavily weighted server may answer more than its share of requests.

Least-Connection
Distributes more requests to real servers with fewer active connections. Because it keeps
track of live connections to the real servers through the IPVS table, least-connection is a
type of dynamic scheduling algorithm, making it a better choice if there is a high degree of
variation in the request load. It is best suited for a real server pool where each member
node has roughly the same capacity. If a group of servers have different capabilities,
weighted least-connection scheduling is a better choice.

Weighted Least-Connections (default)
Distributes more requests to servers with fewer active connections relative to their capacities.
Capacity is indicated by a user-assigned weight, which is then adjusted upward or
downward by dynamic load information. The addition of weighting makes this algorithm
ideal when the real server pool contains hardware of varying capacity.

Locality-Based Least-Connection Scheduling
Distributes more requests to servers with fewer active connections relative to their destination
IPs. This algorithm is designed for use in a proxy-cache server cluster. It routes the
packets for an IP address to the server for that address unless that server is above its capacity
and has a server in its half load, in which case it assigns the IP address to the least
loaded real server.

Locality-Based Least-Connection Scheduling with Replication Scheduling
Distributes more requests to servers with fewer active connections relative to their destination
IPs. This algorithm is also designed for use in a proxy-cache server cluster. It differs
from Locality-Based Least-Connection Scheduling by mapping the target IP address to a
subset of real server nodes. Requests are then routed to the server in this subset with the
lowest number of connections. If all the nodes for the destination IP are above capacity, it
replicates a new server for that destination IP address by adding the real server with the
least connections from the overall pool of real servers to the subset of real servers for that
destination IP. The most loaded node is then dropped from the real server subset to prevent
over-replication.

Destination Hash Scheduling
Distributes requests to the pool of real servers by looking up the destination IP in a static
hash table. This algorithm is designed for use in a proxy-cache server cluster.
Source Hash Scheduling
Distributes requests to the pool of real servers by looking up the source IP in a static hash
table. This algorithm is designed for LVS routers with multiple firewalls.
I'm currently doing some testing with one or two of the more viable options and will follow up with my choice (and why I chose it).

Part II here.
-n

Tuesday, February 15, 2011

Using the Linux diff command

I need to do a Moodle minor version upgrade(1.9.x to 1.9.y), but my basecode is highly customized. For a vanilla installation of moodle, an upgrade is very straightforward - but how much moodle installs are actually vanilla?

What I intend to do is a side-by-side comparison to check from a file standpoint the differences between the updated moodle core and my customized version.

0. copy production moodle code to a test machine (if possible)

1. on test machine download latest version of moodle from www.moodle.org and extract to a folder in tmp

2. run the following command:
diff -qry /path/to/current/code /path/to/downloaded/code > /pipe/to/textfile.txt

3. Have fun comparing files. I recommending opening the file in a spreadsheet editor.

Cheers,
-n