Article: 76746 of comp.arch From: mash@mash.engr.sgi.com (John R. Mashey) Newsgroups: comp.arch,comp.benchmarks Subject: Re: Value proposition of E10000 (was: We need a successor to the 8400 TL) Date: 31 Jan 1999 07:05:21 GMT Organization: Silicon Graphics, Inc. Lines: 194 Message-ID: <790vbh$c4$1@murrow.corp.sgi.com> References: <3695FE68.B587EC5F@strategypartners.com> <77olil$jhi$1@murrow.corp.sgi.com> <77rvc0$27k@gwis2.circ.gwu.edu> <874spngyl0.fsf@mihalis.ix.netcom.com> <784ku8$72j$2@murrow.corp.sgi.com> In article , alanc@west.sun.com (Alan Charlesworth) writes: |> One way to see what large systems are used for is to look at the current |> (Nov 98) Top 500 list at http://www.netlib.org/benchmark/top500.html. |> |> The large E10000s and IBM SPs on the list are mainly used by industrial |> applications, while the SGI Origins are more used by academics and |> researchers. These three systems make up about 2/3 of the list. (This statement is true, but as we'll see, perhaps does not mean what it seems to mean, when the analysis is done ... there are 25000+ Origins in the world, and a lot of them are used in industrial applications...) Actually, this is a good illustration of why one must be careful to understand what benchmarks mean. In this particular case, what it means is that it is sad, but true, that the meaningfulness of the Top500 list is degrading from wherever it used to be. 1) The Top500 list was started as an assessment of supercomputers, which are mostly bought for floating-point calculations. It uses LINPACK as a metric. LINPACK may or may not be representative of floating-point computation in general (and the Top500 list is careful to say this), but it is certainly targeted to measure floating-point performance. 2) Over time, the Top500 lists has accumulated many massive-parallel, SMP, and ccNUMA systems based on microprocessors. Once upon a time, about the only ones that made the list were either MPP, or microprocessors tuned for floating-point (like MIPS R8000-based Power Challenges, and some IBM POWER versions). 3) At this point, there exist machines whose LINPACK scores get them on the list ... whether or not those particular machines *ever* do any significant floating-point computation whatsoever. I'd characterize the machines on the list as follows: a) Actually used for technical, floating-point computing, i.e., what the list is supposed to be about. Most vector machines fit this [but there are some taht get used for other things]. b) Never, or hardly ever used for floating-point computation, but if it were, it's LINPACK would be high enough to get on the list, so it is included on the list. For example, a machine located at Oracle for porting is probably not doing LINPACK :-) There are machines that are doing computationally-intense codes, but not floating-point. c) Mixed use: floating-point, integer computation, DBMS. d) Ambiguous: from the listing, it is impossible to tell. For example, some financial systems spend all day doing FP crunching, some never do any, and sometimes they're mixed ... but if there are 3 machines, all will appaer on the list. 4) Now, the Top500 folks quite reasonably do not want to get into the difficult arbitration job of sorting out d) into a), b), and c), and dropping b), and maybe c). [It is of course quite arguable about what makes any sense. People also argue wabout the correlation of LINPACK with other codes.] Unfortunately, whereas the list *used* to give soem sense of what was happening in high-performance floating-point computing (a), the more the list gets filled with b), c), and d), the less useful it becomes for that purpose. (Remember, the list is based on LINPACK). There is a lot of this going on: vendors, of course, are motivated to have 1) Many machines on the list, and/or 2) High-ranked machines. 5) There are of course, other anomolous/ambiguous issues. Some of the machines listed as single machines are actually clusters of N machines, and the site owners get to choose whether to call these 1 machine or N machines. For example, if LANL chose to call ASIC Blue Mountain 48 128P Origins, the bottom 47 machines would fall off the list: 33 Suns 6 IBMs 5 SGIs 2 Compaqs 1 NEC This is not a criticism of the Top500 effort, which is a lot of work ... but it does say one needs to be careful in over-interpreting the data. 6) It is worth perusing the list itself to try to understand what is actually going on. Given the ambiguities already mentioned, and lack of time, I'm not going to try to do an exhausitive analysis, but I propose the following, based on *knowing* what many of the Origins are actually doing, having a reasonable guess at the general nature of the work on some classified machines, and some guesses about machines usage patterns in some industries. a) Most (80%?)of the Origins on the list are used for classic HPC (a), although a few are b) or c). b) Some of the SPs are really doing data mining or other commercial, non-HPC things. c) A majority of the Suns appear to fit categories b), and c), although it is somewhat hard to tell, many are in the ambiguous d) ... In some cases, although ambiguous on the face of it, I've heard of specific applications, and hence can guess. | (Good summary of E10000 features)... Perhaps alanc can categorize these with publicly-available information. My guess are shown below, noting that I'm guessing a, b, c, but the count for d shows teh level of uncertainty (higher = more uncertain): |> Sun E10000 Use Systems |> Industry Aerospace 2 d:2 guess a:1, c:1 |> Industry Chemistry 2 d:2 guess a:1, c:1 |> Industry Database 9 b:9 |> Industry Electronics 6 d:6 guess a:2, b:2, c:2 |> Industry Energy 2 d:2 guess a:1, b:1 |> Industry Finance 22 (Commerzbank 8, DeutscheMorgan 2, TorontoExchange 2) d:22 guess a:10, b:10, c:2 |> Industry Geophysics 3 d:3 guess a:1, b:2 |> Industry Inform Proc 3 b:3 |> Industry Manufacturing 3 d:3 guess a:1, b:2 |> Industry Media 1 b:1 |> Industry Pharmaceutics 2 d:2 guess c:2 |> Industry Telecom 28 (AT&T 12, GTE 3, Bell Canada 3, NTT 2) d:28 guess a:4, b:20, c:4 |> Industry Transportation 7 (Delta Airlines 2, Sabre 1) d:7 guess a:2, b:4, c:1 |> Industry WWW 2 (eBay has 1) b:2 |> Industry Misc 2 d:2 guess c:2 |> Government 3 (IRS 1) d:3 guess a:1, b:1, c:1 |> Classified 15 d:15 guess a:5, b:5, c:5 |> Research 4 d:4 guess a:2, b:1, c:1 |> Academic 7 d:7 |> Vendor 1 [ACtually, I counted 2: Portland & Japan] |> -- |> E10000 total 124 (32 procs min) a: 33 b: 68 c: 24 (Obviously, there's plenty of guesswork here, and I'll be happy to be proved wrong ... but from knowing some of the specific deals, I'd guess that~80% of the E10000s aren't actually doing much FP computation. On the IBM SPs, most of the categories are actually doing FP-compute, but I'd guess that Database, some of Electronics, some of Finance, much of Telecom aren't. I.e., I'd guess ~20-30 systems aren't. |> IBM SP use Systems |> Industry Automotive 1 |> Industry Chemistry 3 |> Industry Database 4 |> Industry Defense 1 |> Industry Electronics 6 |> Industry Finance 7 |> Industry Geophysics 9 |> Industry Manufacturing 1 |> Industry Software 1 |> Industry Telecom 18 |> Indurtsy Transportation 1 |> Industry Misc 13 |> Government 1 |> Classified 4 |> Academic 15 |> Research 15 |> Vendor 4 |> --- |> IBM SP total 104 (40 procs min) Now, perhaps for a slightly contentious remark: a) One could draw the conclusion from the top-level data that Origins & SPs are used for academic & research HPC, whereas Sun E10000s dominate industrial HPC. b) But actually, what I think is going on is that: 1) Industrial HPC (floating-point) customers buy SGIs, HPs, IBMs, Alphas, mostly, although some Sun-oriented shops may buy E10000s (but more likely, clusters of smaller Suns). In general, industrial customers tend not to push on CPU-count as fast as government and research sites. 2) *Most* of E10000s on the list aren't doing classic HPC at all, but they're big enough that their LINPACK numbers get them onto the list. Although a 32P Origin2000's LINPACK number is not high enough to make the list, it does better on SPECfp_rate, and on many real codes, such that a 32P O2000 and a 64P E10000 overlap in throughput on such codes ... but the former costs less, and industrial customers buy lots of them, but they don't ever show up on the Top500 list. Now, of course I might be biased, and none of this is against the letter of the rules, and I guess usefulness-degradation is a fact of life, but it's still too bad, as the Top500 list's meaningfulness is going downhill. If the list is going to fill up with machines that are really running DBMS, maybe the list should use some other metric than LINPACK, and it would probably turn out to be fair if the list were dominated by big IBM mainframes. -- -john mashey DISCLAIMER: EMAIL: mash@sgi.com DDD: 650-933-3090 FAX: 650-933-4392 USPS: Silicon Graphics/Cray Research 40U-005, 2011 N. Shoreline Blvd, Mountain View, CA 94043-1389