Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Administration (2000)
 2006 Budget Brainstorm

Author  Topic 

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2005-08-17 : 16:53:22
I'm a newly installed DBA for an insurance company and I've been tasked with providing my budget items for 2006. Basically, the objectives for next year include "100% guarantee of on-line availability of our sites", "tools to proactively monitor system performance", "potential tools that would enhance our ability to develop or document", and "proper fail-over for all business critical systems".

We currently have three SQL servers: one each for production, staging and development. The production machine is a quad processor Xeon 2.0 GHZ, 16 GB Ram, has an OS Drive and three RAID 5 Arrays (Backups + Apps, Data, Logs). The staging machine is both our Testing environment and our DR. The machine mostly mirrors the setup for production with the exception of having only two processors. Development is a stunted machine that consists of dual-1.13 GHZ PIIIs, 2.25GB RAM, and ~60 GB disk space.

We use SQL Litespeed for production and staging and will soon be getting some licenses for Red Gate Software's SQL Compare tools. For monitoring and notification, we have a Nagios installation that monitors some key indicators such as free disk space and service status and kicks off some emails to our cell phones in the event of a problem.

As far as infrastructure, I was planning on submitting for a second production server to form either an Active - Active cluster or just use log shipping and leverage the second server for Reporting. I assume I need to add the cost of a SAN in order to make the cluster happen, so that may look less attractive to the sith lords (upper management). Any thoughts? I am aware that clusters pose certain challenges, but are there any third-party tools that mitigate those?

The other barrier to this is that I have no experience with clusters or log shipping, so I was also planning on adding some training to get me up to speed. Any recommendations on training (in WI/midwest)? Any additional considerations I should be aware of?

Otherwise, do you currently use any third party tools that you can't live without? Are there any packages I should consider? Do you see any big red flags that I should consider?

Any input is appreciated.

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2005-08-17 : 16:57:18
I forgot to mention that we are using SQL Server 2000 Enterprise on all of our servers.
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2005-08-17 : 19:37:23
You don't need a SAN for clustering, all you need is DAS (direct attached storage). Basically, you take your SAN config, remove the Fibre Channel Switches and you have a DAS setup. It can even be done with SCSI external storage. Considering the number of spindles you have, you'll probably need a Fibre Channel DAS.

Talk to some hardware vendors about DAS. Most of them should offer something.

WIth your RAID arrays, I'd probably reccomend a LOT of changes there. Transaction logs need to go on a RAID 1 or RAID 1/0 array. They should not be on a RAID 5 due to the write penality that RAID 5 gives you. Data can go on a RAID 5 or RAID 1/0 if you have the money and if you are doing enough writes.

Let me know if you have more questions. I've got a lot of background in this area. I've got a few SAN's and a DAS setup that I work with.

Michael

<Yoda>Use the Search page you must. Find the answer you will. Cursors, path to the Dark Side they are. Avoid them, you must. Use Order By NewID() to get a random record you will.</Yoda>
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2005-08-17 : 21:05:15
whatever you do, don't cluster with DAS. Getting close to 100% uptime involves a lot more than just clustering your servers. Clustering with DAS is a disaster waiting to happen.

If your bosses want 100% uptime (which is impossible) you are going to need to spend some dough. If they don't want to spend the money to get to allow you to achieve this, then you might want to consider finding another job. 99.999% uptime means 5 minutes of downtime a year.



If faced with this situation, I would look for the following:

1. Servers that have redundant and hot-swappable parts. INcluding memory and CPU.
2. SAN storage with lots of redundancy. THis basically means look at getting an hitachi lightning 9900 series or EMC DMX SAN.
3. redundant SAN switches
4. Dual HBAs in each server
5. dual paths on your LAN with redundant switches/routers etc.
I could go on, but what is the point.

Bottom line is to get close to 100% guaranteed uptime, you will need to spend millions. Not only that, but you will have to staff up in order to support the millions of dollars in equipment and software that you will have just purchased.

Is this still what your PHBs are looking for?


-ec
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2005-08-17 : 21:29:53
quote:
"100% guarantee of on-line availability of our sites"


Tell them to take out insurance...Oh wait.. :-)

DavidM

A front-end is something that tries to violate a back-end.
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2005-08-18 : 11:54:01
EyeChart, how is clustering with a DAS a disaster? It's worked gret for me over the last few years. We have redundant paths to the data, just like a SAN setup. The only difference between what we have and a SAN is that only our cluster server can connect to our storage, and that suits me just fine.

Is there something about DAS and clustering that I'm unaware of? Now ya got me worried :)

Michael

<Yoda>Use the Search page you must. Find the answer you will. Cursors, path to the Dark Side they are. Avoid them, you must. Use Order By NewID() to get a random record you will.</Yoda>
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2005-08-18 : 12:51:27
quote:
Originally posted by MichaelP

EyeChart, how is clustering with a DAS a disaster? It's worked gret for me over the last few years. We have redundant paths to the data, just like a SAN setup. The only difference between what we have and a SAN is that only our cluster server can connect to our storage, and that suits me just fine.

Is there something about DAS and clustering that I'm unaware of? Now ya got me worried :)

Michael



The problems we faced were with shared SCSI locking - both systems attempting to access the same drive. Even though these disks were controlled via MSCS. I have not experienced these types of problems with a SAN solution using MSCS. I have had this issue with EMC hardware on IBM AIX based hardware, but the locks were easily dealt with using the management interfaces to the SAN.

Also, are you able to configure multiple paths using a shared SCSI approach? We have multiple HBAs configured in our systems and we use a multipath tool to control their usage. If we have an HBA failure, or a fiber failure, or a port in the SAN switch fail we are still good to go.

A SAN solution costs big bucks, but 99.999% uptime is not an inexpensive undertaking.



-ec
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2005-08-18 : 13:35:50
Yeah, we had multipathing with our DAS SCSI system, and we have multipathing with our DAS Fibre Channel setup.
We've recently moved to a Fibre Channel SAN in one of our environments, and it's pretty nice.

I agree that the more 9's you have in your uptime, the more dollars you spend. I've even heard things like for every 9 you add, you add ten times the cost or something crazy like that.

Michael

<Yoda>Use the Search page you must. Find the answer you will. Cursors, path to the Dark Side they are. Avoid them, you must. Use Order By NewID() to get a random record you will.</Yoda>
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-08-18 : 15:38:28
"100% guarantee of on-line availability of our sites"

I'm brilliant at Maths, but can someone remind me how many 9's that is please?!

Kristen
Go to Top of Page

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2005-08-18 : 18:52:54
Hey all, Thanks a lot for the replies. Luckily, I've confirmed that "100% guarantee of on-line availability of our sites" really means "greatly improve on our up-time without going overboard".

Is a storage configuration possible that would allow clustered sql servers to be in different physical locations? Also, any responses on the other questions I posed in my initial post?
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2005-08-18 : 19:28:32
quote:
Originally posted by lazerath
Is a storage configuration possible that would allow clustered sql servers to be in different physical locations? Also, any responses on the other questions I posed in my initial post?



yeah, that is possible - but it would be a little on the overboard side of things.

Look into clustering using the HP packaged cluster. They have a DL380 paired with the MSA1000 or MSA500 mini-SAN that is pretty affordable and seems to work well.

You can look at log shipping today with SQL2K and ship the logs to your remote location, or look at SQL 2K5 when it becomes available. They have a new feature called database mirroring that might be worth looking into when they ship the new release.

HEre is some info on it http://www.microsoft.com/technet/prodtechnol/sql/2005/dbmirror.mspx



-ec
Go to Top of Page

AjarnMark
SQL Slashing Gunting Master

3246 Posts

Posted - 2005-08-18 : 20:11:44
lazerath, on the topics of monitoring your SQL Servers, you might want to look into Lumigent's Entegra product.

As for training, how about taking a few days and joining us in Texas at the 2005 Pass Community Summit the last week in September? The basic conference is Wednesday through Friday with hundreds of presentations. I'd guess that about 1/4 of them are put on by Microsoft, and the rest by experienced users or consultants. Also, the Product Expo will have reps from the major SQL tool providers like Lumigent, Red Gate, Idera, Embarcadero, etc. where you can see demos and pick their brains for free (included with conference).

---------------------------
EmeraldCityDomains.com
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-08-19 : 03:07:15
""100% guarantee of on-line availability of our sites" really means "greatly improve on our up-time without going overboard""

Ah ... "Several 9's" then?

"on the topics of monitoring your SQL Servers"

FWIW we use Servers Alive - which can also monitor that Web sites are alive etc - in fact the tests include Winsock, Ping, NT Services/Processes, Space available on a given UNC, URL contains some text, SQL DB available, Netware, SNMP, and an ERRORLEVEL returned by a DOS command.

On failure (and on Back Up again) specific commands can be run - we run a batch file to append the Down/Up times to a LOG. There are escallation commands after a number of successive "downs", Send EMail, Play a sound, Page you and Syslog. And you can schedule it for certain times in the day.

Oh ... and its Free for up to 10 tests (e.g. 10 servers)

EDIT: I just looked at their site and there are a load more features in the current version.

EDIT: ServersAlive can also make a web page - so you can see what's Up & Down by checking that page.

Kristen
Go to Top of Page

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2005-08-19 : 12:50:43
Micheal, can you direct me to the product(s) you use on your DAS Fibre Channel setup? I'm curious on the savings (if any) between that and a SAN setup.

We are shooting for 99.9% uptime, which means a little over 8 hours a year of downtime is allowed. Would it be possible to have an Active/Active cluster at our primary datacenter and a log-shipped/mirrored warm spare at the secondary datacenter? I don't see any reason why that wouldn't work, but if someone has more knowledge on the subject and can point out flaws, I'm all ears. If we used SQL Server 2005, maybe we could use the auto failover that exists with database mirroring.

Thanks for those product links, I'll be sure to check them out.
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2005-08-19 : 13:36:15
We use an EMC CX200. I don't think you can buy those anymore, I think you have to get a CX300. Personally, I like the IBM FastT200, but If I recall, you can't DAS it and get the multipathing functionality. YOU WANT MULTIPATHING (IE two paths to the storage box from each server)!! Talk to your vendors about this to make sure. Our latest SAN is a IBM FastT700. It's very fast and was very expensive.

As far as cost savings, the differences between a DAS and a SAN for a two node cluster is two fibre channel switches. Depending on what you get, you are looking at around $7k-10k each, so going DAS will save you around $14k-20k. Down the road, you can change your DAS to SAN if needed.

As far as Clustering, you want Active / Passive. That will give you about 90 seconds of downtime if you have a failure. If your apps are written to handle a "rety" on connecting to SQL server, the apps will never know that SQL server is down. Active / Active is so that you can have both machines running SQL server and performing operations at the same time. The tricky part there is making yoru apps use both servers and making sure that those servers never use more than 50% f any resources, becuase if they fail over to the other node, you'd not have enough resources to make perform well. If I recall Active / Active tends to be more complicated to setup.

Hope that helps!
Michael

<Yoda>Use the Search page you must. Find the answer you will. Cursors, path to the Dark Side they are. Avoid them, you must. Use Order By NewID() to get a random record you will.</Yoda>
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2005-08-19 : 14:30:56
There really isn't a difference in setup between ACTIVE/PASSIVE and ACTIVE/ACTIVE. All ACTIVE/ACTIVE means is that you have another instance of SQL Server running, and it is running from the 2nd node. SQL Server clustering isn't done for performance reasons, it is done for high availability reasons. This is different than the Oracle RAC approach. SQL server is shared nothing, RAC is shared everything. Anyway...

In SQL Server clustering you put one SQL instance per cluster group. each cluster group has it's own physical disk located in the shared storage. No cluster group can share physical disk with another cluster group. You assign a cluster group to a node in your cluster. You are able to move the groups between nodes - fail over and fail back.

IF you want to have and ACTIVE/ACTIVE cluster working on the same dataset it gets complicated. I think this would be a federated server setup where you have a unioned view. I don't know anyone who is running this kind of setup, except for stuff that you see created for TPC scores and maybe custom written applications for really big data warehouses (maybe terraserver but I'm not sure). The federated server stuff gets very complicated and there are some limitations with the distributed partitioned views to work around. I suggest you try to keep it simple.

Anyway, with ACTIVE/ACTIVE clustering you need to be aware of the load on your servers as Michael has pointed out. If you do have a failure and both instances are running off of one node, they may battle each other for resources and your performance could be impacted. Careful planning will help you avoid those situations.

As an aside, we just implemented some HP DL385 and DL585 boxes running AMD opteron chips. HOly shit are they fast. The 585 use the dual core CPUs so we effectively have a clustered 8CPU box. The only drawbacks to the Opteron based systems is that you don't have the RAID memory capability of the DL580 because the memory controller is on the CPU - this is also a positive because RAM access occurs at the clockrate of the CPU. The 2nd drawback is we had to implement SQL2K Sp4 even though it has some lingering performance issues - so far no problems.

Sorry, didn't mean to sound like an advert for HP, but I'm impressed with these machines. Combine these (or the DL385 boxes) with the DAS or SAN that Michael mentioned and you have a screaming system that will scale pretty nicely - and give you the HA features needed to maintain that 99.9 uptime requirement.



-ec
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2005-08-19 : 15:00:16
Those HP "clusters in a box" are a pretty nice setup. Those will probably suit yoru needs if they can hold enough disks to do what you want. The EMC box that we had only holds 15, and I'm not sure we can scale past 15 without going to a SAN. Be aware of the scalability of the disk subsystem that you buy.

Also, I'd reccomend getting a professional to install, setup, and test your cluster for you. With this being your first cluster, you want to make sure it's done right. Be sure to test everything (IE pulling out power cables and fibre cables while the machine is on and performing IO).

Michael

<Yoda>Use the Search page you must. Find the answer you will. Cursors, path to the Dark Side they are. Avoid them, you must. Use Order By NewID() to get a random record you will.</Yoda>
Go to Top of Page

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2005-08-19 : 17:22:48
For the record, you guys rock. Thanks for all the great information.
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2005-08-19 : 18:12:42
quote:
Originally posted by lazerath

For the record, you guys rock. Thanks for all the great information.




good luck - keep us posted as to what you guys do.



-ec
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-08-19 : 18:42:06
quote:
Originally posted by lazerath

For the record, you guys rock. Thanks for all the great information.



I agree, have been following this thread.
Very good info!

rockmoose
Go to Top of Page
   

- Advertisement -