Set up cluster of load balancing with apache and tomcat

  • 2020-05-13 03:59:05
  • OfStack

1. Concept of cluster and load balancing

(1) the concept of cluster

Cluster (Cluster) is a loosely coupled set of computing nodes composed of two or more node machines (servers), providing users with a single customer view of network services or applications (including databases, Web services, file services, etc.), as well as providing near-fault-tolerant recovery capabilities. Cluster systems 1 are generally interconnected by hardware and software through two or more node server systems, each of which is a separate server running its own processes. These processes can communicate with each other as if they were a single system for the network client, working together to provide applications, system resources, and data to the user. In addition to serving as a single system, clustered systems have the ability to recover from server-level failures. Clustered systems can also increase the processing power of servers from within by continuing to add servers to the cluster, and provide inherent reliability and availability through system-level redundancy.
(2) cluster classification
1. High performance computing science cluster:
IA cluster system for solving complex scientific computing problems. Is the basis of parallel computing. Instead of using a dedicated parallel supercomputer consisting of 10 to tens of thousands of independent processors, it USES a group of 1/2/4 CPU IA servers linked by high-speed connections and communicates on the common messaging layer to run parallel applications. The processing power of such a computing cluster is equal to that of a truly super parallel machine, and the cost performance is excellent.
2. Load balancing cluster:
Load balancing cluster provides a more practical system for enterprise requirements. The system enables the load traffic of each node to be distributed equally and reasonably in the server cluster as far as possible. This load requires balancing the calculated application processing port load or network traffic load. Such a system is ideal for a large number of users running the same set of applications. Each node can handle 1 part of the load, and the load can be dynamically distributed among the nodes to achieve balance. The same is true for network traffic. Typically, a web server application receives a large amount of incoming traffic and cannot process it quickly, requiring traffic to be sent to other nodes. The load balancing algorithm can also be optimized according to the different available resources of each node or the special environment of the network.
3. High availability cluster:
To ensure the high availability of the cluster's overall services, consider the fault tolerance of the computing hardware and software. If a node in the high availability cluster fails, it is replaced by another node. The entire system environment is 1 to the user.

In practical cluster systems, the three basic types often mix and mingle.

(3) typical cluster

Scientific computing cluster:
1, Beowulf
When it comes to Linux clusters, many people's first response is Beowulf. That is the most famous Linux science software cluster system. In fact, it is a generic term for a set of common software packages that run on the Linux kernel. These include popular software messaging API, such as "messaging interfaces" (MPI) or "parallel virtual machines" (PVM), modifications to the Linux kernel to allow the combination of several Ethernet interfaces, high-performance network drivers, changes to the virtual memory manager, and distributed interprocess communication (DIPC) services. The common global process identity space allows access to any process from any node using the DIPC mechanism.
2, MOSIX
Beowulf is similar to a cluster-enabled plug-in installed on a system, providing application-level clustering capabilities. However, MOSIX completely modifies the kernel of Linux, providing clustering capability at the system level. It is completely transparent to the application. The original application can run normally on the MOSIX system without any changes. Any node in the cluster can be freely added and removed to take over the work of other nodes, or to extend the system. MOSIX USES adaptive process load balancing and memory booting algorithms to maximize overall performance. Application processes can migrate between nodes to take advantage of the best resources, similar to symmetric multiprocessor systems that can switch applications between processors. Due to the fact that MOSIX implements clustering by modifying the kernel, some system-level applications will not function properly due to compatibility issues.

Load balancing/high availability clustering

3. LVS (Linux Virtual Server)
This is a project hosted by the Chinese people.
It is a load balancing/high availability cluster for high-volume web applications (such as news services, online banking, e-commerce, etc.).
LVS is built on a cluster of one master server (usually two) (director) and several real servers (real-server). real-server is responsible for actually providing the service, and the master server controls real-server according to the specified scheduling algorithm. The structure of the cluster is transparent to the users. The client only communicates with a single IP (virtual IP of the cluster system), that is to say, from the perspective of the client, there is only a single server.
N54537Real-server can provide many services, such as ftp, http, dns, telnet, nntp, smtp and so on. The master server is responsible for controlling Real-Server. When a client makes a service request to LVS, Director will use a specific scheduling algorithm to specify that an Real-Server will respond to the request, while the client will only communicate with IP (virtual IP, VIP) of Load Balancer.

Other clusters:

At present, the cluster system can be described as 5 flowers and 8 doors. Most of OS developers and server developers provide system-level cluster products. The most typical ones are all kinds of dual-machine systems and cluster systems provided by various scientific research institutions. And the application level cluster system provided by various software developers, such as database cluster, Application Server cluster, Web Server cluster, mail cluster, etc.

(4) load balancing

1, concept,

Due to the rapid growth of traffic volume, traffic volume and data flow, the processing capacity and computing intensity of each core part of the existing network also increase correspondingly with the increase of business volume, which makes the single server equipment can not bear it at all. In this case, if throw away existing equipment to do a lot of hardware upgrades, this will cause waste of existing resources, and if the face under one business improvement, which in turn will lead to once again the high costs of hardware upgrades, even more outstanding performance equipment cannot meet the needs of the current business growth.

A cheap, efficient and transparent way to extend the bandwidth of existing network devices and servers, increase throughput, enhance network data processing capacity, and improve network flexibility and availability is called load balancing (Load Balance).

2. Features and classification

Load balancing (Server Load Balance)1 is generally used to improve the overall processing capacity of the server, and improve the reliability, availability and maintainability. The ultimate goal is to speed up the response speed of the server, so as to improve the user experience.

The structure of load balancing is divided into local load balancing (Local Server Load Balance) and regional load balancing (Global Server Load Balance)(global load balancing). 1 refers to the load balancing of local server farms, and the other 1 refers to the load balancing of different networks and server farms located in different geographical locations.

Regional load balancing has the following characteristics:

(1) solve the problem of network congestion, provide services nearby, and realize geographical location independence
(2) provide better access quality to users
(3) improve the response speed of the server
(4) improve the utilization efficiency of servers and other resources
(5) avoid single point of failure of data center

3. Main application of load balancing technology

(1) the earliest load balancing technology of DNS is realized through DNS. In DNS, multiple addresses are configured with the same name, so the client that queries for this name will get one of the addresses, so that different clients can access different servers and achieve the purpose of load balancing. DNS load balancing is a simple and effective method, but it does not differentiate between servers or reflect the current running state of servers.
(2) the proxy server load balancing USES a proxy server to forward requests to internal servers. Using this accelerated mode can obviously improve the access speed of static web pages. However, you can also consider a technique that USES proxy servers to evenly forward requests to multiple servers for load balancing purposes.
(3) address translation gateway load balancing supports load-balanced address translation gateway, which can map one external IP address to multiple internal IP addresses, and dynamically use one internal address for each TCP connection request to achieve the purpose of load balancing.
(4) load balancing is supported within the protocol. In addition to the three load balancing methods, some of the protocols support load balancing related functions, such as the redirection capability in HTTP protocol, etc. HTTP runs at the highest level of the TCP connection.
(5) NAT load balancing NAT (Network Address Translation network address translation) simply means the conversion of one IP address to another IP address, which is generally used for the conversion between an unregistered internal address and a legal, registered Internet IP address. It is applicable to solve the address tension of Internet IP, and do not want the outside of the network to know the internal network structure.
(6) reverse proxy load balancing the common proxy way is that the proxy internal network user accesses the connection request of the server on internet, and the client must specify the proxy server and send the connection request that would have been sent directly to the server on internet to the proxy server for processing. Reverse proxy (Reverse Proxy) means that the proxy server accepts the connection request on internet, then forwards the request to the server on the internal network, and returns the result from the server to the client that requests the connection on internet. At this time, the proxy server appears as one server externally. Reverse proxy load balancing technology is to dynamically forward connection requests from internet to multiple servers on the internal network in a reverse proxy way for processing, so as to achieve the purpose of load balancing.
(7) hybrid load balancing in some large networks, due to the hardware devices in multiple server group, their size, provide services such as difference, we may consider giving each server group adopt the most appropriate way of load balancing, and then once again between the multiple server group or cluster load balancing provides services as a whole (i.e., the multiple server group as a new server group), so as to achieve the best performance. We call this approach hybrid load balancing. This approach is also sometimes used when the performance of a single equalizing device cannot satisfy a large number of connection requests.

2. Set up clusters and realize load balancing

(1) preliminary preparation

My system USES windowsXP pro, so what I'm going to do is build a cluster using jk with 1 apache and multiple tomcat (two examples here). Here's what to prepare first:

1. jdk, the version I use is jdk1.5.0_06, Download address is http: / / 192.18.108.216 ECom EComTicketServlet BEGIND597A309654D73D910E051D73D539D5F / 1/852050-2147483648/2438196255 / / 851882/2438196255/2 ts + / westCoastFSEND / jdk - 1.5.0 _13 oth - JPR/jdk - 1.5.0 _13 oth - JPR: 3 / _5_0_13 jdk - 1 - windows - i586 - p. exe
2, apache, what I use is 2.2.4 version, the download address is http: / / apache justdn. org httpd/binaries win32 / apache_2. 2.4 win32 - x86 - openssl - 0.9.8 d. msi
3. tomcat, the version I used is the decompression version of 5.5. Here, please note that the installed version cannot be used, because there will be a mistake if two copies of tomcat are installed on one machine. Download address is http: / / apache mirror. phpchina. com/tomcat/tomcat - 5 / v5. 5.25 / bin/apache - tomcat - 5.5.25. zip
4. jk, the version of jk, originally had two versions, but version 2 has been abandoned. Currently, the available version of jk is 1.2.25. For each version of apache, there will be a specific jk corresponding to it, so the jk used here must also be the one developed for apache-2.2.4. Its download address is http: / / www apache. org/dist/tomcat/tomcat - connectors/jk/binaries/win32 jk - 1.2.25 / mod_jk - apache - 2.2.4. so

With these four things, we can start clustering.

(2) installation

1. I believe that anyone who needs to read this article will be familiar with the installation of JDK. Just a reminder: don't forget to configure your environment variables.
2. There is no difficulty in installing apache. It is only necessary to configure information such as domain name, website address and administrator email during the installation process. In addition to this, make sure that port 80 on the machine is not being used by other programs. As for the installation path, it all depends on personal preference. Other defaults will do. After the successful installation, there will be an icon in the tray area in the lower right corner of the system, through which we can launch apache. If the little red dot turns green, it means that the service has started normally (if the service does not start, it means that there is a mistake in the configuration during the installation process, and it is suggested to uninstall and reinstall). If port 80 is the default, open the browser and type: http://localhost/, you should see "It works". So we're ready to go to the next step.
3. Extract tomcat and make two copies. Name the two tomcat: tomcat-5.5.25_1 and tomcat-5.5.25_2, but the contents of the two folders are exactly the same. But in order for me to cluster on the same machine, I need to make sure that the two tomcat runs without collisions on the ports. Go to tomcat-5.5.25_1 /conf, open server.xml with a text editor and change the default port of tomcat from 8080 to 8088 (there's no need to change, I changed this because I have other tomcat on my machine using port 8080). Then go to the tomcat-5.5.25_2 /conf directory and change the 8080 as well. It doesn't matter how much you change it to, as long as it doesn't occupy the port of other programs, there should be no problem. That way, tomcat is installed.
4. jk is a connection module. You don't need to install it, just copy mod_jk-apache-2.2.4.so to the modules folder in the apache installation directory.

With that, the installation is complete and the configuration begins.

(3) configuration

This place is the key to building a cluster, and I'll write as much detail as I can.

1. Configure tomcat

To prevent conflicts, go to the second tomcat home directory, then the conf directory, and open server.xml to modify the configuration. The main thing is to modify the port. I will add 1000 to all the port information on the original basis, that is, the original port is 8009, but I will change it to 9009. Of course, you don't have to be like me, just make sure there's no conflict! These configurations may be used in the configuration of apache.

2. Configure apache

(1) go to the apache home directory, then go to the conf folder, open httpd.conf with a text editor, and add the following lines at the end of the file:

Load the mod_jk module
LoadModule jk_module modules/mod_jk-apache-2.2.4.so

# # # configuration mod_jk
JkWorkersFile conf/ workers.properties # load workers in the cluster
JkMountFile conf/ uriworkermap.properties # load the request processing allocation file for workers
JkLogFile logs/ mod_jk.log # specifies the log output file for jk
JkLogLevel warn # specifies the logging level

(2) do not change the directory, create a new file: workers.properties, which is used to configure the web container information. The document reads as follows:

# worker list
worker.list=controller, status

The first configuration for server, server is called s1
ajp13 port number, server.xml configuration under tomcat, default 8009
worker.s1.port=8009
The host address of #tomcat. If it is not the machine, please fill in the ip address
worker.s1.host=localhost
worker.s1.type=ajp13
The weighted weight of #server, the higher the value, the more requests for allocation
worker.s1.lbfactor=1

The second configuration for server, server is called s2
worker.s2.port=9009
worker.s2.host=localhost
worker.s2.type=ajp13
worker.s2.lbfactor=1

#server is called controller and is used for load balancing
worker.controller.type=lb
worker.retries=3 # number of retries
Specify the list of server for share requests, separated by commas
worker.controller.balanced_workers=s1,s2
Many articles say that setting server to 1 is ok, but I set it to 0 is ok
worker.controller.sticky_session=0
#worker.controller.sticky_session_force=1

worker.status.type=status

(3) do not change the directory, create a new file: uriworkermap.properties, the file contents are as follows:

/*=controller # all requests are handled by controller, which is server
/jkstatus=status # all requests containing jkstatus are handled by status, server

! /*.gif=controller # all requests ending in.gif are not handled by controller, server
!/*.jpg=controller
!/*.png=controller
!/*.css=controller
!/*.js=controller
!/*.htm=controller
!/*.html=controller

"Here!" Similar to "! "in java. "Means" not ".

Thus, the apache1 block is configured.

3. Modify the tomcat configuration: both tomcat will be configured here.

Again, open the server.xml file from step 1, and find it < Engine name="Catalina" defaultHost="localhost" > This line, add a sentence inside: jvmRoute="s1", that is, change the sentence to: < Engine name="Catalina" defaultHost="localhost" jvmRoute="s1" > . Here, s1 is the name of the server configured for load balancing in step 2. If the port of tomcat is the port used by s1 in step 2, then s1 will be used, and the second tomcat will be s2.

This completes the configuration.

(4) operation

Go into the tomcat bin directory, execute tomcat startup.bat to start the tomcat, then restart apache and run it. If nothing else, the two Windows of tomcat should be your once and my once printed log information, and session is also Shared at this time.

At this point, the cluster is set up and the load balancing is done.


Related articles: