I/O latency warnings / DL380 G7 / HUS110

Hi all,

On our newly fresh installed vSphere 5 Update 1 environment, we are seeing latency warnings.

First of all, a short overview:

2 x HP DL380 G7 (each host has 2 x QLE2560 FC Single Port HBAs)

1 x Hitachi Unified Storage 110

Everything is running at 8Gb/s over FC. The SAN is configured as follows:

- Disk0 to Disk6 = RAID-Group 000 / RAID Level: RAID5(6D+1P) = VOLUME 0200 = 3.1TB

- Disk7 to Disk13 = RAID Group 001 / RAID Level: RAID5(6D+1P) = VOLUME 0201 = 3.1TB

- Disk23 = SpareDisk

So in our vSphere environment, we have 2 datastores called "DataStore_Prod01" and "DataStore_Test01". Each datastore is using 3.1TB and has been formated as VMFS5. So each datastore has its own RAID-Group. The storage is directly connected to the hosts over FC. There is no FC-Switch. At the moment, I have 30 VMs running (15 VMs on "DataStore_Prod01" and 15 VMs on "DataStore_Test01"). Those VMs are idle at the most of the time.

Then suddenly, I detected some latency warnings on all hosts with all datastores. They look like this:

Device naa.60060e80105395f0056fc1ef000000c8
performance has deteriorated. I/O latency
increased from average value of 3298
microseconds to 197574 microseconds.
warning
10.07.2012 21:16:37
esx001.xxxx.xxxx

Device naa.60060e80105395f0056fc1ef000000c9
performance has deteriorated. I/O latency
increased from average value of 2182
microseconds to 277405 microseconds.
warning
10.07.2012 21:16:06
esx002.xxxx.xxxx

All our HBAs have the following Firmware/driver version:

BIOS version 3.00
FCODE version 3.15
EFI version 2.21
Flash FW version 5.04.01

Driver version 901.k1.1-14vmw

As recommended by HDS, I changed the queue depth to 32, because Hitachi Data Systems recommends an LU queue depth of 32 for SAS drives. The value Disk.SchedNumReqOutstanding was already at 32.

This is what VMware Support has found out:

hostname: ESX001

VMware ESXi 5.0.0 build-623860

# We have 4 vmkernel files

# Time range: 2012-05-25 06:16:54 - 2012-06-20 14:02:51, unique log entries for 22 different days

# The error: "Cmd xxx to dev xxx failed" has been reported 60 times in that period

# The sum of all SCSI error codes in all 4 vmkernel log files (possible/valid sense data, no mpx devices)

H:0x5 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x0 # FREQUENCY: 1

H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0 # FREQUENCY: 2

H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0 # FREQUENCY: 6

H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x25 0x0 # FREQUENCY: 6

H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0 # FREQUENCY: 9

H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0 # FREQUENCY: 12

H:0x0 D:0x2 P:0x5 Possible sense data: 0x0 0x0 0x0 # FREQUENCY: 12

# The following LUNs were reported in combination with above SCSI error codes

naa.60060e80105395f0056fc1ef000000c8 # FREQUENCY: 20

naa.60060e80105395f0056fc1ef000000c9 # FREQUENCY: 28

# That translates into vmfs datastore names (only vmfs, not NFS or RDM)

naa.60060e80105395f0056fc1ef000000c8:1 DataStore_Prod01

naa.60060e80105395f0056fc1ef000000c9:1 DataStore_Test01

# The SCSI error codes in the vmkernel logs have been observed during 3 different days

2012-05-25 # FREQUENCY: 7

2012-06-13 # FREQUENCY: 15

2012-06-20 # FREQUENCY: 26

Host status [ Device Status [ Sense Key [ ASC + ASCQ

[ SCSI Status codes appear [ SCSI Sense Keys appear in the Sense Data [ Additional Sense Code +

These codes potentially come from [ in the Status byte returned when [ available when a command completes [ Additional Sense Code Qualifier

the firmware on a host adapter [ processing of a command completes [ with a CHECK CONDITION status [ ASC + ASCQ

=============================== =================================== ================================ ================================

naa.60060e80105395f0056fc1ef000000c8

Host: 0x00 DID_OK Device: 02h CHECK CONDITION Sense: 0h NO SENSE ASC: 00h NO ADDITIONAL SENSE INFORMATION

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 0h NO SENSE ASC: 00h NO ADDITIONAL SENSE INFORMATION

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 2h NOT READY ASC: 3Ah MEDIUM NOT PRESENT

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 5h ILLEGAL REQUEST ASC: 25h LOGICAL UNIT NOT SUPPORTED

naa.60060e80105395f0056fc1ef000000c9

Host: 0x00 DID_OK Device: 02h CHECK CONDITION Sense: 0h NO SENSE ASC: 00h NO ADDITIONAL SENSE INFORMATION

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 0h NO SENSE ASC: 00h NO ADDITIONAL SENSE INFORMATION

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 5h ILLEGAL REQUEST ASC: 24h INVALID FIELD IN CDB

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 5h ILLEGAL REQUEST ASC: 25h LOGICAL UNIT NOT SUPPORTED

----------------

hostname: ESX002

VMware ESXi 5.0.0 build-623860

# We have 4 vmkernel files

# Time range: 2012-05-25 05:49:08 - 2012-06-20 14:05:17, unique log entries for 21 different days

# The error: "Cmd xxx to dev xxx failed" has been reported 67 times in that period

# The sum of all SCSI error codes in all 4 vmkernel log files (possible/valid sense data, no mpx devices)

H:0x0 D:0x2 P:0x5 Possible sense data: 0x2 0x3a 0x1 # FREQUENCY: 2

H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x24 0x0 # FREQUENCY: 2

H:0x5 D:0x0 P:0x0 Possible sense data: 0x2 0x3a 0x1 # FREQUENCY: 3

H:0x5 D:0x0 P:0x0 Possible sense data: 0x5 0x25 0x0 # FREQUENCY: 5

H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x25 0x0 # FREQUENCY: 6

H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0 # FREQUENCY: 7

H:0x0 D:0x2 P:0x0 Valid sense data: 0xe 0x1d 0x0 # FREQUENCY: 17

H:0x0 D:0x2 P:0x5 Possible sense data: 0x0 0x0 0x0 # FREQUENCY: 18

# The following LUNs were reported in combination with above SCSI error codes

naa.60060e80105395f0056fc1ef000000c8 # FREQUENCY: 18

naa.60060e80105395f0056fc1ef000000c9 # FREQUENCY: 42

# That translates into vmfs datastore names (only vmfs, not NFS or RDM)

naa.60060e80105395f0056fc1ef000000c8:1 DataStore_Prod01

naa.60060e80105395f0056fc1ef000000c9:1 DataStore_Test01

# The SCSI error codes in the vmkernel logs have been observed during 6 different days

2012-05-25 # FREQUENCY: 4

2012-05-29 # FREQUENCY: 4

2012-06-06 # FREQUENCY: 4

2012-06-14 # FREQUENCY: 7

2012-06-13 # FREQUENCY: 13

2012-06-20 # FREQUENCY: 28

# The following SCSI commands were reported to have failed

6 times 0x4d = LOG SENSE

17 times 0xc0

===> Per Lun/Per Error:

Host status [ Device Status [ Sense Key [ ASC + ASCQ

[ SCSI Status codes appear [ SCSI Sense Keys appear in the Sense Data [ Additional Sense Code +

These codes potentially come from [ in the Status byte returned when [ available when a command completes [ Additional Sense Code Qualifier

the firmware on a host adapter [ processing of a command completes [ with a CHECK CONDITION status [ ASC + ASCQ

=============================== =================================== ================================ ================================

naa.60060e80105395f0056fc1ef000000c8

Host: 0x00 DID_OK Device: 02h CHECK CONDITION Sense: 0h NO SENSE ASC: 00h NO ADDITIONAL SENSE INFORMATION

Host: 0x00 DID_OK Device: 02h CHECK CONDITION Sense: 2h NOT READY ASC: 3Ah MEDIUM NOT PRESENT - TRAY CLOSED

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 0h NO SENSE ASC: 00h NO ADDITIONAL SENSE INFORMATION

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 2h NOT READY ASC: 3Ah MEDIUM NOT PRESENT - TRAY CLOSED

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 5h ILLEGAL REQUEST ASC: 25h LOGICAL UNIT NOT SUPPORTED

naa.60060e80105395f0056fc1ef000000c9

Host: 0x00 DID_OK Device: 02h CHECK CONDITION Sense: 0h NO SENSE ASC: 00h NO ADDITIONAL SENSE INFORMATION

Host: 0x00 DID_OK Device: 02h CHECK CONDITION Sense: 2h NOT READY ASC: 3Ah MEDIUM NOT PRESENT - TRAY CLOSED

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 0h NO SENSE ASC: 00h NO ADDITIONAL SENSE INFORMATION

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 5h ILLEGAL REQUEST ASC: 24h INVALID FIELD IN CDB

Host: 0x05 DID_ABORT Device: 00h GOOD Sense: 5h ILLEGAL REQUEST ASC: 25h LOGICAL UNIT NOT SUPPORTED

Conclusion by VMware-> It appears to be an issue with storage as both hosts are receiving a check condition and Aborts from Device (Storage Array).

Also, after changing the queue depth to 32 I am still getting latency warnings. And what really drives me crazy, is that I can't reproduce these latency warnings. Even with high I/O I am never getting warnings. And it looks that they always appers at xx:15 to xx:16. But there's no scheduled job, neither on the storage array nor in the vSphere environment, which could cause that impact.

HDS Support told me, that there is no problem with the storage array. We saw a fault configured cache size on the SAN but still after the correction, we are still getting latency warnings, not all the time, but sporadic (ones a day or so).

Any suggestion on how to solve this problem is very appreciated.

If you have any questions regaring my configuration don't hesitate to contact me.

Best regards'

Marc

I/O latency warnings / DL380 G7 / HUS110

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112